We are seeing a regression for a subset of workloads across KVM guests over a virtual bridge between host kernel 4.12 and 4.13. Bisecting points to c67df11f "vhost_net: try batch dequing from skb array"
In the regressed environment, we are running 4 kvm guests, 2 running as uperf servers and 2 running as uperf clients, all on a single host. They are connected via a virtual bridge. The uperf client profile looks like: <?xml version="1.0"?> <profile name="TCP_STREAM"> <group nprocs="1"> <transaction iterations="1"> <flowop type="connect" options="remotehost=192.168.122.103 protocol=tcp"/> </transaction> <transaction duration="300"> <flowop type="write" options="count=16 size=30000"/> </transaction> <transaction iterations="1"> <flowop type="disconnect"/> </transaction> </group> </profile> So, 1 tcp streaming instance per client. When upgrading the host kernel from 4.12->4.13, we see about a 30% drop in throughput for this scenario. After the bisect, I further verified that reverting c67df11f on 4.13 "fixes" the throughput for this scenario. On the other hand, if we increase the load by upping the number of streaming instances to 50 (nprocs="50") or even 10, we see instead a ~10% increase in throughput when upgrading host from 4.12->4.13. So it may be the issue is specific to "light load" scenarios. I would expect some overhead for the batching, but 30% seems significant... Any thoughts on what might be happening here?