---------- Forwarded message --------- From: Toke Høiland-Jørgensen <t...@redhat.com> Date: Thu, Dec 8, 2022 at 3:06 PM Subject: Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata To: Stanislav Fomichev <s...@google.com>, <b...@vger.kernel.org> Cc: <a...@kernel.org>, <dan...@iogearbox.net>, <and...@kernel.org>, <martin....@linux.dev>, <s...@kernel.org>, <y...@fb.com>, <john.fastab...@gmail.com>, <kpsi...@kernel.org>, <s...@google.com>, <hao...@google.com>, <jo...@kernel.org>, Saeed Mahameed <sae...@nvidia.com>, David Ahern <dsah...@gmail.com>, Jakub Kicinski <k...@kernel.org>, Willem de Bruijn <will...@google.com>, Jesper Dangaard Brouer <bro...@redhat.com>, Anatoly Burakov <anatoly.bura...@intel.com>, Alexander Lobakin <alexandr.loba...@intel.com>, Magnus Karlsson <magnus.karls...@gmail.com>, Maryam Tahhan <mtah...@redhat.com>, <xdp-hi...@xdp-project.net>, <net...@vger.kernel.org>
Stanislav Fomichev <s...@google.com> writes: > From: Toke Høiland-Jørgensen <t...@redhat.com> > > Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe > pointer to the mlx5e_skb_from* functions so it can be retrieved from the > XDP ctx to do this. So I finally managed to get enough ducks in row to actually benchmark this. With the caveat that I suddenly can't get the timestamp support to work (it was working in an earlier version, but now timestamp_supported() just returns false). I'm not sure if this is an issue with the enablement patch, or if I just haven't gotten the hardware configured properly. I'll investigate some more, but figured I'd post these results now: Baseline XDP_DROP: 25,678,262 pps / 38.94 ns/pkt XDP_DROP + read metadata: 23,924,109 pps / 41.80 ns/pkt Overhead: 1,754,153 pps / 2.86 ns/pkt As per the above, this is with calling three kfuncs/pkt (metadata_supported(), rx_hash_supported() and rx_hash()). So that's ~0.95 ns per function call, which is a bit less, but not far off from the ~1.2 ns that I'm used to. The tests where I accidentally called the default kfuncs cut off ~1.3 ns for one less kfunc call, so it's definitely in that ballpark. I'm not doing anything with the data, just reading it into an on-stack buffer, so this is the smallest possible delta from just getting the data out of the driver. I did confirm that the call instructions are still in the BPF program bytecode when it's dumped back out from the kernel. -Toke -- This song goes out to all the folk that thought Stadia would work: https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz Dave Täht CEO, TekLibre, LLC _______________________________________________ LibreQoS mailing list LibreQoS@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/libreqos