On Thu, Apr 04, 2013 at 04:14:57PM +0300, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 03:06:42PM +0200, Alexander Graf wrote: > > > > On 04.04.2013, at 14:56, Gleb Natapov wrote: > > > > > On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > > >> > > >> On 04.04.2013, at 14:45, Gleb Natapov wrote: > > >> > > >>> On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > >>>> > > >>>> On 04.04.2013, at 14:38, Gleb Natapov wrote: > > >>>> > > >>>>> On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > >>>>>> > > >>>>>> On 04.04.2013, at 14:08, Gleb Natapov wrote: > > >>>>>> > > >>>>>>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > >>>>>>>> > > >>>>>>>> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > >>>>>>>> > > >>>>>>>>> With KVM, MMIO is much slower than PIO, due to the need to > > >>>>>>>>> do page walk and emulation. But with EPT, it does not have to be: > > >>>>>>>>> we > > >>>>>>>>> know the address from the VMCS so if the address is unique, we > > >>>>>>>>> can look > > >>>>>>>>> up the eventfd directly, bypassing emulation. > > >>>>>>>>> > > >>>>>>>>> Add an interface for userspace to specify this per-address, we can > > >>>>>>>>> use this e.g. for virtio. > > >>>>>>>>> > > >>>>>>>>> The implementation adds a separate bus internally. This serves two > > >>>>>>>>> purposes: > > >>>>>>>>> - minimize overhead for old userspace that does not use PV MMIO > > >>>>>>>>> - minimize disruption in other code (since we don't know the > > >>>>>>>>> length, > > >>>>>>>>> devices on the MMIO bus only get a valid address in write, this > > >>>>>>>>> way we don't need to touch all devices to teach them handle > > >>>>>>>>> an dinvalid length) > > >>>>>>>>> > > >>>>>>>>> At the moment, this optimization is only supported for EPT on x86 > > >>>>>>>>> and > > >>>>>>>>> silently ignored for NPT and MMU, so everything works correctly > > >>>>>>>>> but > > >>>>>>>>> slowly. > > >>>>>>>>> > > >>>>>>>>> TODO: NPT, MMU and non x86 architectures. > > >>>>>>>>> > > >>>>>>>>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > >>>>>>>>> pre-review and suggestions. > > >>>>>>>>> > > >>>>>>>>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > >>>>>>>> > > >>>>>>>> This still uses page fault intercepts which are orders of > > >>>>>>>> magnitudes slower than hypercalls. Why don't you just create a PV > > >>>>>>>> MMIO hypercall that the guest can use to invoke MMIO accesses > > >>>>>>>> towards the host based on physical addresses with explicit length > > >>>>>>>> encodings? > > >>>>>>>> > > >>>>>>> It is slower, but not an order of magnitude slower. It become faster > > >>>>>>> with newer HW. > > >>>>>>> > > >>>>>>>> That way you simplify and speed up all code paths, exceeding the > > >>>>>>>> speed of PIO exits even. It should also be quite easily portable, > > >>>>>>>> as all other platforms have hypercalls available as well. > > >>>>>>>> > > >>>>>>> We are trying to avoid PV as much as possible (well this is also PV, > > >>>>>>> but not guest visible > > >>>>>> > > >>>>>> Also, how is this not guest visible? Who sets > > >>>>>> KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition > > >>>>>> indicates that the guest does so, so it is guest visible. > > >>>>>> > > >>>>> QEMU sets it. > > >>>> > > >>>> How does QEMU know? > > >>>> > > >>> Knows what? When to create such eventfd? virtio device knows. > > >> > > >> Where does it know from? > > >> > > > It does it always. > > > > > >>> > > >>>>> > > >>>>>> +/* > > >>>>>> + * PV_MMIO - Guest can promise us that all accesses touching this > > >>>>>> address > > >>>>>> + * are writes of specified length, starting at the specified > > >>>>>> address. > > >>>>>> + * If not - it's a Guest bug. > > >>>>>> + * Can not be used together with either PIO or DATAMATCH. > > >>>>>> + */ > > >>>>>> > > >>>>> Virtio spec will state that access to a kick register needs to be of > > >>>>> specific length. This is reasonable thing for HW to ask. > > >>>> > > >>>> This is a spec change. So the guest would have to indicate that it > > >>>> adheres to a newer spec. Thus it's a guest visible change. > > >>>> > > >>> There is not virtio spec that has kick register in MMIO. The spec is in > > >>> the works AFAIK. Actually PIO will not be deprecated and my suggestion > > >> > > >> So the guest would indicate that it supports a newer revision of the > > >> spec (in your case, that it supports MMIO). How is that any different > > >> from exposing that it supports a PV MMIO hcall? > > >> > > > Guest will indicate nothing. New driver will use MMIO if PIO is bar is > > > not configured. All driver will not work for virtio devices with MMIO > > > bar, but not PIO bar. > > > > I can't parse that, sorry :). > > > I am sure MST can explain it better, but I'll try one more time. > Device will have two BARs with kick register one is PIO another is MMIO. > Old driver works only with PIO new one support both. MMIO is used only > when PIO space is exhausted. So old driver will not be able to drive new > virtio device that have no PIO bar configured.
Right, I think this was the latest proposal by Rusty. The discussion about the new layout is taking place on the virtio mailing list. See thread 'virtio_pci: use separate notification offsets for each vq' started by Rusty. > > > > > >>> is to move to MMIO only when PIO address space is exhausted. For PCI it > > >>> will be never, for PCI-e it will be after ~16 devices. > > >> > > >> Ok, let's go back a step here. Are you actually able to measure any > > >> speed in performance with this patch applied and without when going > > >> through MMIO kicks? > > >> > > >> > > > That's the question for MST. I think he did only micro benchmarks till > > > now and he already posted his result here: > > > > > > mmio-wildcard-eventfd:pci-mem 3529 > > > mmio-pv-eventfd:pci-mem 1878 > > > portio-wildcard-eventfd:pci-io 1846 > > > > > > So the patch speedup mmio by almost 100% and it is almost the same as PIO. > > > > Those numbers don't align at all with what I measured. > I am trying to run vmexit test on AMD now, but something does not work > there. Next week I'll fix it and see how AMD differs, bit on Intel those are > the > numbers. Right. Also next week, need to implement the optimization for NPT. > > > > MST, could you please do a real world latency benchmark with virtio-net and > > > > * normal ioeventfd > > * mmio-pv eventfd > > * hcall eventfd > > > > to give us some idea how much performance we would gain from each approach? > > Thoughput should be completely unaffected anyway, since virtio just > > coalesces kicks internally. > > > > I'm also slightly puzzled why the wildcard eventfd mechanism is so > > significantly slower, while it was only a few percent on my test system. > > What are the numbers you're listing above? Cycles? How many cycles do you > > execute in a second? > > > > > > Alex > > -- > Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/