Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On Thu, Aug 25, 2016 at 5:28 PM, Rick Jones wrote: > On 08/25/2016 02:08 PM, Eric Dumazet wrote: >> >> When XPS was submitted, it was _not_ enabled by default and 'magic' >> >> Some NIC vendors decided it was a good thing, you should complain to >> them ;) > > > I kindasorta am with the emails I've been sending to netdev :) And also > hopefully precluding others going down that path. I recently came across another effect of configuring devices at ndo_open. The behavior of `ip link set dev ${dev} down && ip link set dev ${dev} up` is no longer consistent across devices. Drivers that call netif_set_xps_queue overwrite a user supplied xps configuration, others leave it in place. This is easily demonstrated by adding this snippet to the loopback driver: +static int loopback_dev_open(struct net_device *dev) +{ + cpumask_t mask; + + cpumask_clear(&mask); + cpumask_set_cpu(0, &mask); + + return netif_set_xps_queue(dev, &mask, 0); +} + static void loopback_dev_free(struct net_device *dev) { free_percpu(dev->lstats); @@ -158,6 +168,7 @@ static void loopback_dev_free(struct net_device *dev) static const struct net_device_ops loopback_ops = { .ndo_init = loopback_dev_init, + .ndo_open = loopback_dev_open, and running fp=/sys/class/net/lo/queues/tx-0/xps_cpus cat ${fp} echo 8 > ${fp} cat ${fp} ip link set dev lo down ip link set dev lo up cat ${fp} On a vanilla kernel, the output is {0, 8, 8} : the user-supplied xps setting persists With the patch, it is {1, 8, 1} : the driver sets, and resets, xps A third patch that adds netif_set_xps_queue to loopback_dev_init gives {1, 8, 8}. That is arguably the preferred behavior (sidestepping the issue of whether device driver initialization is a good thing in itself): init with a sane default, but do not override user preference. The fm10k and i40e drivers makes the call conditional on test_and_set_bit(__FM10K_TX_XPS_INIT_DONE), so probably already implement those semantics. In which case other drivers should probably be updated to do the same. If all drivers are essentially trying to set the same basic load balancing policy, this could also be lifted out of drivers into __dev_open for consistency and code deduplication. I'm pointing this out less for changing this xps feature, than as a subtle effect to be aware of with any subsequent device driver policy patches.
Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On Thu, Aug 25, 2016 at 12:49 PM, Eric Dumazet wrote: > On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote: >> I have been seeing a number of issues where XPS leads to issues with >> packets being reordered in the transmit queues of the device drivers. The >> main situation where this seems to occur is when a VM us using a tap >> interface to send packets to the network via a NIC that has XPS enabled. >> >> A bit of looking into this revealed the main issue is that the scheduler >> seems to be migrating the VM between CPUs and as this occurs the traffic >> for a given flow from a VM is following this migration and hopping between >> Tx queues leading to packet reordering. >> >> A workaround for this is to make certain all the VMs have RPS enabled on >> the tap interfaces, however this requires extra configuration on the host >> for each VM created. >> >> A simpler approach is provided with this patch. With it we disable XPS any >> time a socket is not present for a given flow. By doing this we can avoid >> using XPS for any routing or bridging situations in which XPS is likely >> more of a hinderance than a help. > > Yes, but this will destroy isolation for people properly doing VM cpu > pining. > > With this patch, DDOS traffic coming from a VM will hit all TX queues. > As I said the other thread, the better solution is to start tracking flows coming out of VMs. Tom > > > >
Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On 08/25/2016 02:08 PM, Eric Dumazet wrote: When XPS was submitted, it was _not_ enabled by default and 'magic' Some NIC vendors decided it was a good thing, you should complain to them ;) I kindasorta am with the emails I've been sending to netdev :) And also hopefully precluding others going down that path. happy benchmarking, rick
Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On Thu, Aug 25, 2016 at 1:32 PM, Rick Jones wrote: > On 08/25/2016 12:49 PM, Eric Dumazet wrote: >> >> On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote: >>> >>> A simpler approach is provided with this patch. With it we disable XPS >>> any >>> time a socket is not present for a given flow. By doing this we can >>> avoid >>> using XPS for any routing or bridging situations in which XPS is likely >>> more of a hinderance than a help. >> >> >> Yes, but this will destroy isolation for people properly doing VM cpu >> pining. > > > Why not simply stop enabling XPS by default. Treat it like RPS and RFS > (unless I've missed a patch...). The people who are already doing the extra > steps to pin VMs can enable XPS in that case. It isn't clear that one > should always pin VMs - for example if a (public) cloud needed to > oversubscribe the cores. > > happy benchmarking, > > rick jones The big motivation for most of the drivers to have it is because XPS can provide very good savings when you end up aligning the entire transmit path by also controlling the memory allocations and IRQ affinity. What most of these devices try to do by default is isolate all of a given flow to a single CPU so you can get as close to linear scaling as possible. The problem is with XPS disabled by default you take a pretty serious performance hit for all of the non-virtualization cases since you can easily end up crossing CPUs or worse yet NUMA nodes when processing Tx traffic. Really if anything I still think we need to do something to enforce ordering of flows in regards to things that make use of netif_rx instead of using a NAPI processing path. From what I can tell we will always end up running into reordering issues as long as you can have a single thread bouncing between CPUs and spraying packets on to the various backlogs. It is one of the reasons why I had mentioned possibly enabling RPS for these type of interfaces as without it you will still get some reordering, it just varies in the degree as to how much. Essentially all that is happening when you disable XPS is that you shorten the queues, but there are still multiple queues in play. You still can end up with packets getting reordered between the various per_cpu backlogs. - Alex
Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On Thu, 2016-08-25 at 13:32 -0700, Rick Jones wrote: > On 08/25/2016 12:49 PM, Eric Dumazet wrote: > > On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote: > >> A simpler approach is provided with this patch. With it we disable XPS any > >> time a socket is not present for a given flow. By doing this we can avoid > >> using XPS for any routing or bridging situations in which XPS is likely > >> more of a hinderance than a help. > > > > Yes, but this will destroy isolation for people properly doing VM cpu > > pining. > > Why not simply stop enabling XPS by default. Treat it like RPS and RFS > (unless I've missed a patch...). The people who are already doing the > extra steps to pin VMs can enable XPS in that case. It isn't clear that > one should always pin VMs - for example if a (public) cloud needed to > oversubscribe the cores. > When XPS was submitted, it was _not_ enabled by default and 'magic' Some NIC vendors decided it was a good thing, you should complain to them ;)
Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On 08/25/2016 12:49 PM, Eric Dumazet wrote: On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote: A simpler approach is provided with this patch. With it we disable XPS any time a socket is not present for a given flow. By doing this we can avoid using XPS for any routing or bridging situations in which XPS is likely more of a hinderance than a help. Yes, but this will destroy isolation for people properly doing VM cpu pining. Why not simply stop enabling XPS by default. Treat it like RPS and RFS (unless I've missed a patch...). The people who are already doing the extra steps to pin VMs can enable XPS in that case. It isn't clear that one should always pin VMs - for example if a (public) cloud needed to oversubscribe the cores. happy benchmarking, rick jones
Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote: > I have been seeing a number of issues where XPS leads to issues with > packets being reordered in the transmit queues of the device drivers. The > main situation where this seems to occur is when a VM us using a tap > interface to send packets to the network via a NIC that has XPS enabled. > > A bit of looking into this revealed the main issue is that the scheduler > seems to be migrating the VM between CPUs and as this occurs the traffic > for a given flow from a VM is following this migration and hopping between > Tx queues leading to packet reordering. > > A workaround for this is to make certain all the VMs have RPS enabled on > the tap interfaces, however this requires extra configuration on the host > for each VM created. > > A simpler approach is provided with this patch. With it we disable XPS any > time a socket is not present for a given flow. By doing this we can avoid > using XPS for any routing or bridging situations in which XPS is likely > more of a hinderance than a help. Yes, but this will destroy isolation for people properly doing VM cpu pining. With this patch, DDOS traffic coming from a VM will hit all TX queues.
[RFC PATCH] net: Require socket to allow XPS to set queue mapping
I have been seeing a number of issues where XPS leads to issues with packets being reordered in the transmit queues of the device drivers. The main situation where this seems to occur is when a VM us using a tap interface to send packets to the network via a NIC that has XPS enabled. A bit of looking into this revealed the main issue is that the scheduler seems to be migrating the VM between CPUs and as this occurs the traffic for a given flow from a VM is following this migration and hopping between Tx queues leading to packet reordering. A workaround for this is to make certain all the VMs have RPS enabled on the tap interfaces, however this requires extra configuration on the host for each VM created. A simpler approach is provided with this patch. With it we disable XPS any time a socket is not present for a given flow. By doing this we can avoid using XPS for any routing or bridging situations in which XPS is likely more of a hinderance than a help. Signed-off-by: Alexander Duyck --- net/core/dev.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index a75df86..9cea73b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3242,24 +3242,26 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb) { + int queue_index, new_index; struct sock *sk = skb->sk; - int queue_index = sk_tx_queue_get(sk); - if (queue_index < 0 || skb->ooo_okay || - queue_index >= dev->real_num_tx_queues) { - int new_index = get_xps_queue(dev, skb); - if (new_index < 0) - new_index = skb_tx_hash(dev, skb); + if (!sk) + return skb_tx_hash(dev, skb); - if (queue_index != new_index && sk && - sk_fullsock(sk) && - rcu_access_pointer(sk->sk_dst_cache)) - sk_tx_queue_set(sk, new_index); + queue_index = sk_tx_queue_get(sk); + if (queue_index >= 0 && !skb->ooo_okay && + queue_index < dev->real_num_tx_queues) + return queue_index; - queue_index = new_index; - } + new_index = get_xps_queue(dev, skb); + if (new_index < 0) + new_index = skb_tx_hash(dev, skb); - return queue_index; + if (queue_index != new_index && sk_fullsock(sk) && + rcu_access_pointer(sk->sk_dst_cache)) + sk_tx_queue_set(sk, new_index); + + return new_index; } struct netdev_queue *netdev_pick_tx(struct net_device *dev,