Hi all,
> -----Original Message-----
> From: Dariusz Sosnowski <[email protected]>
> Sent: Tuesday, January 23, 2024 12:37
> To: Stephen Hemminger <[email protected]>
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <[email protected]>;
> Ferruh Yigit <[email protected]>; Andrew Rybchenko
> <[email protected]>; Ori Kam <[email protected]>;
> [email protected]
> Subject: RE: [RFC] ethdev: fast path async flow API
>
> Hi Stephen,
>
> Sorry for such a late response.
>
> From: Stephen Hemminger <[email protected]>
> Sent: Thursday, January 4, 2024 02:08
> > On Wed, 3 Jan 2024 19:14:49 +0000
> > Dariusz Sosnowski <[email protected]> wrote:
> > > In summary, in my opinion extending the async flow API with bulking
> > capabilities or exposing the queue directly to the application is not
> > desirable.
> > > This proposal aims to reduce the I-cache overhead in async flow API
> > > by
> > reusing the existing design pattern in DPDK - fast path functions are
> > inlined to the application code and they call cached PMD callbacks.
> >
> > Inline needs to more discouraged in DPDK, because it only works if
> > application ends up building with DPDK from source.
> > It doesn't work for the Linux distro packaging model and symbol
> > versioning, etc.
>
> I understand the problem. In that case, I have a proposal.
> I had a chat with Thomas regarding this RFC, and he noticed that there are 2
> separate changes proposed here:
>
> 1. Per-port callbacks for async flow API.
> - Moves specified callbacks to rte_flow_fp_ops struct and allow PMDs to
> register them dynamically.
> - Removes indirection at API level - no need to call rte_flow_ops_get().
> - Removes checking if callbacks are defined - either the default DPDK
> callback
> is used or the one provided by PMD.
> 2. Make async flow API functions inlineable.
>
> Change (1) won't break ABI (existing callbacks in rte_flow_ops can be marked
> as deprecated for now and phased out later) and in my opinion is less
> controversial compared to change (2).
>
> I'll rerun the benchmarks for both changes separately to compare their
> benefits in isolation.
> This would allow us to decide if change (2) is worth the drawbacks it
> introduces.
>
> What do you think?
I split the RFC into 2 parts:
1. Introduce per-port callbacks:
- Introduce rte_flow_fp_ops struct - holds callbacks for fast path calls,
for each port. PMD registers callbacks through rte_flow_fp_ops_register().
- Relevant rte_flow_async_* functions just pass arguments to fast path
callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is defined.
- Biggest difference is the removal of callback access through
rte_flow_get_ops().
2. Inline async flow API functions.
- Relevant rte_flow_async_* functions definitions are moved to rte_flow.h
and made inlineable.
Here are the results of the benchmark:
| | Avg Insertion | Diff over baseline | Avg Deletion |
Diff over baseline |
|-----------------------|---------------|--------------------|--------------|--------------------|
| baseline (v24.03-rc0) | 5855.4 | | 9026.8 |
|
| applied (1) | 6384.2 | +528.8 (+9%) | 10054.2 |
+1027.4 (+11.4%) |
| applied (2) | 6434.6 | +579.2 (+9.9%) | 10011.4 |
+984.6 (+10.9%) |
Results are in KFlows/sec.
The benchmark is continuously inserting and deleting 2M flow rules.
These rules match on IPv4 destination address and with a single action DROP.
Flow rules are inserted and deleted using a single flow queue.
Change (2) improves insertion rate performance by ~1% compared to (1), but
decreases deletion rate by ~0.5%.
Based on these results, I think we can say that making rte_flow_async_*() calls
inlineable may not be worth it compared to the issues it causes.
What do you all think about the results?
Best regards,
Dariusz Sosnowski