On 16-05-19 08:35 AM, Eric Dumazet wrote:
From: Eric Dumazet <eduma...@google.com>

Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure.


Good stuff.
There are other optimization we could do in such large scale dumps
(such as not dumping something that hasnt been updated)
Could we have changed it to be rcu?

These stats are using u64 or u32 fields, so reading integral values
should not prevent writers from doing concurrent updates if the kernel
arch is a 64bit one.


Meaning it wont work on other archs? is atomic read not dependable
on other setups?


Being able to atomically fetch all counters like packets and bytes sent
at the expense of interfering in fast path (queue and dequeue packets)
is simply not worth the pain, as the values are generally stale after 1
usec.

These lock acquisitions slow down the fast path by 10 to 20 %



Acked-by: Jamal Hadi Salim <j...@mojatatu.com>

cheers,
jamal

Reply via email to