On Tue, Jul 24, 2018 at 2:39 PM Michael S. Tsirkin <m...@redhat.com> wrote: > > On Tue, Jul 24, 2018 at 10:01:39AM -0400, Willem de Bruijn wrote: > > On Tue, Jul 24, 2018 at 6:44 AM Michael S. Tsirkin <m...@redhat.com> wrote: > > > > > > On Mon, Jul 23, 2018 at 04:11:19PM -0700, Caleb Raitto wrote: > > > > From: Caleb Raitto <carai...@google.com> > > > > > > > > The driver disables tx napi if it's not certain that completions will > > > > be processed affine with tx service. > > > > > > > > Its heuristic doesn't account for some scenarios where it is, such as > > > > when the queue pair count matches the core but not hyperthread count. > > > > > > > > Allow userspace to override the heuristic. This is an alternative > > > > solution to that in the linked patch. That added more logic in the > > > > kernel for these cases, but the agreement was that this was better left > > > > to user control. > > > > > > > > Do not expand the existing napi_tx variable to a ternary value, > > > > because doing so can break user applications that expect > > > > boolean ('Y'/'N') instead of integer output. Add a new param instead. > > > > > > > > Link: https://patchwork.ozlabs.org/patch/725249/ > > > > Acked-by: Willem de Bruijn <will...@google.com> > > > > Acked-by: Jon Olson <jonol...@google.com> > > > > Signed-off-by: Caleb Raitto <carai...@google.com> > > > > > > Is there a reason the same rule should apply to all devices? > > > If not shouldn't this be an ethtool option? > > > > It not very likely that a guest will have multiple virtio_net devices, > > some of which have affinity_hint_set and some do not? > > Just to answer this question, I do hear a lot about guests with multiple > virtio net interfaces. These might be just the noisy few, but they do > exist. > > > I'd really rather not add the extra option at all, but remove > > the affinity_hint_set requirement for now. Without more data, > > I understand the concern about cacheline bouncing if napi-tx > > would becomes the default at some point and we don't have > > data on this by then. But while it isn't default and a user has to > > opt in to napi-tx to try it that seems enough guardrail to me. > > > > The original reason was lack of data on whether napi-tx may suffer > > from cache invalidations when tx and rx softirq are on different cpus > > and we enable tx descriptor cleaning from the rx handler (i.e., on ACK). > > >From those initial numbers it seemed to be a win even with those > > invalidations. > > > > https://patchwork.ozlabs.org/patch/746232/ > > > > In lieu of removing the affinity_hint_set, this boolean is the least amount > > of both new interface and implementation to allow experimentation. We > > can easily leave it as a noop eventually when we are confident that > > napi-tx can be enabled even without affinity. By comparison, an ethtool > > param would be quite a bit of new logic. > > So it boils down to this: if we believe napi tx is > always a good idea, then this flag is there just in case, > and the patch is fine. If it's good for some workloads > but not others, and will be for a while, we are better off > with an ethtool flag. > > What's the case here?
Based on benchmark results so far, it looks like the first. It's markedly better for some and in the noise for most. The crux is the "for a while". We undoubtedly will find some cases that we need to fix before flipping the default. The choice is between using napi_tx vs adding a separate ethtool function to do essentially the same. This patch is the simpler option, and easier to backport to guest kernels. > OTOH if you want to add more trick to the affinity hint, such > as the mentioned above # of queues matching core count, > that is also fine IMHO. >From the above linked patch, I understand that there are yet other special cases in production, such as a hard cap on #tx queues to 32 regardless of number of vcpus. I don't think we want to add special cases for all this kind of business logic in the kernel.