---------- Forwarded message ---------
From: Toke Høiland-Jørgensen <t...@redhat.com>
Date: Thu, Dec 8, 2022 at 3:06 PM
Subject: Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata
To: Stanislav Fomichev <s...@google.com>, <b...@vger.kernel.org>
Cc: <a...@kernel.org>, <dan...@iogearbox.net>, <and...@kernel.org>,
<martin....@linux.dev>, <s...@kernel.org>, <y...@fb.com>,
<john.fastab...@gmail.com>, <kpsi...@kernel.org>, <s...@google.com>,
<hao...@google.com>, <jo...@kernel.org>, Saeed Mahameed
<sae...@nvidia.com>, David Ahern <dsah...@gmail.com>, Jakub Kicinski
<k...@kernel.org>, Willem de Bruijn <will...@google.com>, Jesper
Dangaard Brouer <bro...@redhat.com>, Anatoly Burakov
<anatoly.bura...@intel.com>, Alexander Lobakin
<alexandr.loba...@intel.com>, Magnus Karlsson
<magnus.karls...@gmail.com>, Maryam Tahhan <mtah...@redhat.com>,
<xdp-hi...@xdp-project.net>, <net...@vger.kernel.org>


Stanislav Fomichev <s...@google.com> writes:

> From: Toke Høiland-Jørgensen <t...@redhat.com>
>
> Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
> pointer to the mlx5e_skb_from* functions so it can be retrieved from the
> XDP ctx to do this.

So I finally managed to get enough ducks in row to actually benchmark
this. With the caveat that I suddenly can't get the timestamp support to
work (it was working in an earlier version, but now
timestamp_supported() just returns false). I'm not sure if this is an
issue with the enablement patch, or if I just haven't gotten the
hardware configured properly. I'll investigate some more, but figured
I'd post these results now:

Baseline XDP_DROP:         25,678,262 pps / 38.94 ns/pkt
XDP_DROP + read metadata:  23,924,109 pps / 41.80 ns/pkt
Overhead:                   1,754,153 pps /  2.86 ns/pkt

As per the above, this is with calling three kfuncs/pkt
(metadata_supported(), rx_hash_supported() and rx_hash()). So that's
~0.95 ns per function call, which is a bit less, but not far off from
the ~1.2 ns that I'm used to. The tests where I accidentally called the
default kfuncs cut off ~1.3 ns for one less kfunc call, so it's
definitely in that ballpark.

I'm not doing anything with the data, just reading it into an on-stack
buffer, so this is the smallest possible delta from just getting the
data out of the driver. I did confirm that the call instructions are
still in the BPF program bytecode when it's dumped back out from the
kernel.

-Toke



-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC
_______________________________________________
LibreQoS mailing list
LibreQoS@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/libreqos

Reply via email to