On Thu, Sep 29, 2016 at 9:18 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Thu, 2016-09-29 at 08:53 -0400, Tom Herbert wrote: >> On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet <eric.duma...@gmail.com> >> wrote: >> > On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote: >> >> xps_flows maintains a per device flow table that is indexed by the >> >> skbuff hash. The table is only consulted when there is no queue saved in >> >> a transmit socket for an skbuff. >> >> >> >> Each entry in the flow table contains a queue index and a queue >> >> pointer. The queue pointer is set when a queue is chosen using a >> >> flow table entry. This pointer is set to the head pointer in the >> >> transmit queue (which is maintained by BQL). >> >> >> >> The new function get_xps_flows_index that looks up flows in the >> >> xps_flows table. The entry returned gives the last queue a matching flow >> >> used. The returned queue is compared against the normal XPS queue. If >> >> they are different, then we only switch if the tail pointer in the TX >> >> queue has advanced past the pointer saved in the entry. In this >> >> way OOO should be avoided when XPS wants to use a different queue. >> > >> > There is something I do not understand with this. >> > >> > If this OOO avoidance is tied to BQL, it means that packets sitting in a >> > qdisc wont be part of the detection. >> > >> > So packets of flow X could possibly be queued on multiple qdiscs. >> > >> Hi Eric, >> >> I'm not sure I understand your concern. If packets for flow X can be >> queued on multiple qdiscs wouldn't that mean that flow will be subject >> to ooo transmission regardless of this patch? In other words here >> we're trying to keep packets for the flow in order as they are emitted >> from the qdiscs which we assume maintains ordering of packets in a >> flow. > > Well, then what this patch series is solving ? > It addresses the issue that Rick Jones pointed out was happening with XPS. When packets are sent for a flow that has no socket and XPS is enabled then each packet uses the XPS queue based on the running CPU. Since the thread sending on a flow can be rescheduled on different CPUs this is creating ooo packets. In this case the ooo is being caused by interaction with XPS.
> You have a producer of packets running on 8 vcpus in a VM. > > Packets are exiting the VM and need to be queued on a mq NIC in the > hypervisor. > > Flow X can be scheduled on any of these 8 vcpus, so XPS is currently > selecting different TXQ. > > Your patch aims to detect this and steer packets to one TXQ, but not > using a static hash() % num_real_queues (as RSS would have done on a NIC > at RX), but trying to please XPS (as : selecting a queue close to the > CPU producing the packet. VM with one vcpu, and cpu bounded, would > really be happy to not spread packets all over the queues) > You're not describing this patch, you're describing how XPS works in the first place. This patch is done to make socketless packets work well with XPS. If all you want is a static hash() % num_real_queues then just disable XPS to get that. > Your mechanism relies on counters updated at the time packets are given > to the NIC, not at the time packets are given to the txq (and eventually > sit for a while in the qdisc right before BQL layer) > > dev_queue_xmit() is not talking to the device directly... > > > All I am saying is that the producer/consumer counters you added are not > at the right place. > > You tried hard to not change the drivers, by adding something to > existing BQL. But packets can sit in a qdisc for a while... > But again, this patch is to prevent ooo being caused by an interaction with XPS. It does not address ooo being caused by qdiscs or VMs, but then I don't see that as being the issue reported by Rick. > So you added 2 potential cache lines misses per outgoing packet, but > with no guarantee that OOO are really avoided. > > >