On 2020-12-14 21:35, Cong Wang wrote:
On Mon, Dec 14, 2020 at 7:13 AM Maxim Mikityanskiy <maxi...@nvidia.com> wrote:

On 2020-12-11 21:16, Cong Wang wrote:
On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy <maxi...@mellanox.com> wrote:

HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.

This solution addresses two main problems of scaling HTB:

1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:

I do not think this is the reason, tcf_classify() has been called with RCU
only on the ingress side for a rather long time. What contentions are you
talking about here?

When one attaches filters to HTB, tcf_classify is called from
htb_classify, which is called from htb_enqueue, which is called with the
root spinlock of the qdisc taken.

So it has nothing to do with tcf_classify() itself... :-/

[...]

And doesn't TBF already work with mq? I mean you can attach it as
a leaf to each mq so that the tree lock will not be shared either, but you'd
lose the benefits of a global rate limit too.

Yes, I'd lose not only the global rate limit, but also multi-level
hierarchical limits, which are all provided by this HTB offload - that's
why TBF is not really a replacement for this feature.

Interesting, please explain how your HTB offload still has a global rate
limit and borrowing across queues?

Sure, I will explain that.

I simply can't see it, all I can see
is you offload HTB into each queue in ->attach(),

In the non-offload mode, the same HTB instance would be attached to all queues. In the offload mode, HTB behaves like MQ: there is a root instance of HTB, but each queue gets a separate simple qdisc (pfifo). Only the root qdisc (HTB) gets offloaded, and when that happens, the NIC creates an object for the QoS root.

Then all configuration changes are sent to the driver, and it issues the corresponding firmware commands to replicate the whole hierarchy in the NIC. Leaf classes correspond to queue groups (in this implementation queue groups contain only one queue, but it can be extended), and inner classes correspond to entities called TSARs.

The information about rate limits is stored inside TSARs and queue groups. Queues know what groups they belong to, and groups and TSARs know what TSAR is their parent. A queue is picked in ndo_select_queue by looking at the classification result of clsact. So, when a packet is put onto a queue, the NIC can track the whole hierarchy and do the HTB algorithm.

where I assume the
hardware will do rate limit on each queue,

So, it's not flat in the NIC, and rate limiting is done in a hierarchical way.

if the hardware also has a
global control, why it is not reflected on the root qdisc?

I'm not sure if I got this last question correctly. The root qdisc is HTB, and all the configuration of the HTB tree gets reflected in the NIC, as I just explained. I hope now it's clearer, but if you still have questions, I'm glad to explain more details (also, I'm ready to respin with the minor fixes for the CI build issue on parisc).

Thanks,
Max

Thanks!


Reply via email to