On 03/26/2018 09:36 AM, David Miller wrote: > From: John Fastabend <john.fastab...@gmail.com> > Date: Sat, 24 Mar 2018 22:25:06 -0700 > >> After the qdisc lock was dropped in pfifo_fast we allow multiple >> enqueue threads and dequeue threads to run in parallel. On the >> enqueue side the skb bit ooo_okay is used to ensure all related >> skbs are enqueued in-order. On the dequeue side though there is >> no similar logic. What we observe is with fewer queues than CPUs >> it is possible to re-order packets when two instances of >> __qdisc_run() are running in parallel. Each thread will dequeue >> a skb and then whichever thread calls the ndo op first will >> be sent on the wire. This doesn't typically happen because >> qdisc_run() is usually triggered by the same core that did the >> enqueue. However, drivers will trigger __netif_schedule() >> when queues are transitioning from stopped to awake using the >> netif_tx_wake_* APIs. When this happens netif_schedule() calls >> qdisc_run() on the same CPU that did the netif_tx_wake_* which >> is usually done in the interrupt completion context. This CPU >> is selected with the irq affinity which is unrelated to the >> enqueue operations. >> >> To resolve this we add a RUNNING bit to the qdisc to ensure >> only a single dequeue per qdisc is running. Enqueue and dequeue >> operations can still run in parallel and also on multi queue >> NICs we can still have a dequeue in-flight per qdisc, which >> is typically per CPU. >> >> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") >> Reported-by: Jakob Unterwurzacher >> <jakob.unterwurzac...@theobroma-systems.com> >> Signed-off-by: John Fastabend <john.fastab...@gmail.com> > > Applied, thanks John. >
Great, also off-list email from Jakob (I forgot to add him to the CC list here, oops) told me to add, Tested-by: Jakob Unterwurzacher <jakob.unterwurzac...@theobroma-systems.com> Also in net-next I'll look to see if we can avoid doing the extra atomics especially in cases where they are not actually needed. For example the 1:1 qdisc to txq mappings. It seems a bit evasive though for net. Finally just an FYI but I think I'll look at a distributed counter soon so we can get a lockless token bucket. I need the counter for BPF as well so coming soon. Thanks, John