On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote: > On 09/05/2017 06:45 AM, Tiwei Bie wrote: > > On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote: > > > virtio_net device might be accessed while being reallocated > > > in case of NUMA awareness. This case might be theoretical, > > > but it will be needed anyway to protect vrings pages against > > > invalidation. > > > > > > The virtio_net devs are now protected with a readers/writers > > > lock, so that before reallocating the device, it is ensured > > > that it is not being referenced by the processing threads. > > > > > [...] > > > +struct virtio_net * > > > +get_device(int vid) > > > +{ > > > + struct virtio_net *dev; > > > + > > > + rte_rwlock_read_lock(&vhost_devices[vid].lock); > > > + > > > + dev = __get_device(vid); > > > + if (unlikely(!dev)) > > > + rte_rwlock_read_unlock(&vhost_devices[vid].lock); > > > + > > > + return dev; > > > +} > > > + > > > +void > > > +put_device(int vid) > > > +{ > > > + rte_rwlock_read_unlock(&vhost_devices[vid].lock); > > > +} > > > + > > > > This patch introduced a per-device rwlock which needs to be acquired > > unconditionally in the data path. So for each vhost device, the IO > > threads of different queues will need to acquire/release this lock > > during each enqueue and dequeue operation, which will cause cache > > contention when multiple queues are enabled and handled by different > > cores. With this patch alone, I saw ~7% performance drop when enabling > > 6 queues to do 64bytes iofwd loopback test. Is there any way to avoid > > introducing this lock to the data path? > > First, I'd like to thank you for running the MQ test. > I agree it may have a performance impact in this case. > > This lock has currently two purposes: > 1. Prevent referencing freed virtio_dev struct in case of numa_realloc. > 2. Protect vring pages against invalidation. > > For 2., it can be fixed by using the per-vq IOTLB lock (it was not the > case in my early prototypes that had per device IOTLB cache). > > For 1., this is an existing problem, so we might consider it is > acceptable to keep current state. Maybe it could be improved by only > reallocating in case VQ0 is not on the right NUMA node, the other VQs > not being initialized at this point. > > If we do this we might be able to get rid of this lock, I need some more > time though to ensure I'm not missing something. > > What do you think? >
Cool. So it's possible that the lock in the data path will be acquired only when the IOMMU feature is enabled. It will be great! Besides, I just did a very simple MQ test to verify my thoughts. Lei (CC'ed in this mail) may do a thorough performance test for this patch set to evaluate the performance impacts. Best regards, Tiwei Bie