RE: [RFC 0/6] KVM: arm/arm64: gsi routing support
Hello! > The series therefore allows and mandates the usage of KVM_SET_GSI_ROUTING > ioctl along with KVM_IRQFD. If the userspace does not define any routing > table, no irqfd injection can happen. The user-space can use > KVM_CAP_IRQ_ROUTING to detect whether a routing table is needed. Yesterday, half-sleeping in the train back home, i've got a simple idea how to resolve conflicts with existing static GSI->SPI routing without bringing in any more inconsistencies. So far, in current implementation GSI is an SPI index (let alone KVM_IRQ_LINE, because it's already another story on ARM). In order to maintain this convention we could simply implement default routing which sets all GSIs to corresponding SPI pins. So, if the userland never cares about KVM_SET_GSI_ROUTING, everything works as before. But it will be possible to re-route GSIs to MSI. It will perfectly work because SPI signaling is used with GICv2m, and MSI with GICv3(+), which cannot be used at the same time. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested EPT Write Protection
On 19/06/2015 03:52, Hu Yaohui wrote: > Hi All, > In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the > nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If > the L1 guest writes to the guest EPT(EPT12). How can the shadow > EPT(EPT02) be modified according? Because the EPT02 is write protected, writes to the EPT12 will trap to the hypervisor. The hypervisor will execute the write instruction before reentering the guest and invalidate the modified parts of the EPT02. When the invalidated part of the EPT02 is accessed, the hypervisor will rebuild it according to the EPT12 and the KVM memslots. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nested EPT Write Protection
Hi All, In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If the L1 guest writes to the guest EPT(EPT12). How can the shadow EPT(EPT02) be modified according? Thanks, Yaohui -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: Makefile: allow overriding CC and LD
On Thu, 2015-06-18 at 16:50 +0100, Andre Przywara wrote: > Currently we set CC unconditionally to ${CROSS_COMPILE}gcc, the same > for LD. > Allow people to override the compiler name by specifying it explicitly > on the command line or via the environment. > Beside calling a certain compiler binary this allows to pass in > options to the compiler, which lets us get rid of the PowerPC > overrides in the Makefile. Possible uses: > $ make CC="gcc -m64" LD="ld -melf64ppc" > (build kvmtool on a PowerPC toolchain defaulting to 32-bit) > $ make CC="gcc -m32" LD="ld -melf_i386" > (build a 32-bit binary on a multilib-enabled x86-64 compiler) I'm not a big fan of that. Your examples are all about overriding CFLAGS and LDFLAGS, not CC and LD. So if anything you should be allowing that. Adding flags to CC and LD is asking for trouble. cheers -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] powerpc: use default endianness for converting guest/init
On Thu, 2015-06-18 at 15:52 +0100, Andre Przywara wrote: > Hi, > > On 06/17/2015 10:43 AM, Andre Przywara wrote: > > For converting the guest/init binary into an object file, we call > > the linker binary, setting the endianness to big endian explicitly > > when compiling kvmtool for powerpc. > > This breaks if the compiler is actually targetting little endian > > (which is true for the Debian port, for instance). > > Remove the explicit big endianness switch from the linker call to > > allow linking on little endian PowerPC builds again. > > > > Signed-off-by: Andre Przywara > > --- > > Hi, > > > > this fixed the powerpc64le build for me, while still compiling fine > > for big endian. Admittedly this whole init->guest_init.o conversion > > has its issues (with MIPS, for instance), which deserve proper fixing, > > but lets just fix that build for now. > > Will was concerned about breaking toolchains where the linker does not > default to 64-bit. Is that an issue we care about? Yeah, that would be Debian & Ubuntu BE at least, and maybe Fedora too? I'm not sure how you compiled it big endian? > AFAICT LDFLAGS is only used in this dodgy binary-to-object-file > conversion of guest/init. For this we rely on the resulting .o file to > have the same ELF target as the other object files to be finally linked > into the lkvm binary. As we don't compile guest/init with CFLAGS, there > is a possible mismatch. > > I am looking into a proper fix for this now (compiling guest/init with > CFLAGS, calling $CC with linker options instead of $LD and allowing CC > and LD override). Still struggling with MIPS, though :-( Yeah that's obviously a better solution medium term. Can you do something like this? Sorry untested: diff --git a/Makefile b/Makefile index 6110b8e..8663d67 100644 --- a/Makefile +++ b/Makefile @@ -149,7 +149,11 @@ ifeq ($(ARCH), powerpc) OBJS+= powerpc/xics.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 - LDFLAGS += -m elf64ppc + ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),-m elf64ppc),y) + LDFLAGS += -m elf64ppc + else + LDFLAGS += -m elf64leppc + endif ARCH_WANT_LIBFDT := y endif cheers -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64
On 06/18/2015 10:27 AM, Marc Zyngier wrote: > On 16/06/15 22:50, Mario Smarduch wrote: >> After enhancing arm64 FP/SIMD exit handling, FP/SIMD exit branch is moved >> to guest trap handling. This keeps exiting handling flow between both >> architectures consistent. >> >> Signed-off-by: Mario Smarduch >> --- >> arch/arm/kvm/interrupts.S | 12 +++- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S >> index 79caf79..fca2c56 100644 >> --- a/arch/arm/kvm/interrupts.S >> +++ b/arch/arm/kvm/interrupts.S >> @@ -363,10 +363,6 @@ hyp_hvc: >> @ Check syndrome register >> mrc p15, 4, r1, c5, c2, 0 @ HSR >> lsr r0, r1, #HSR_EC_SHIFT >> -#ifdef CONFIG_VFPv3 >> -cmp r0, #HSR_EC_CP_0_13 >> -beq switch_to_guest_vfp >> -#endif >> cmp r0, #HSR_EC_HVC >> bne guest_trap @ Not HVC instr. >> >> @@ -406,6 +402,12 @@ THUMB( orr lr, #1) >> 1: eret >> >> guest_trap: >> +#ifdef CONFIG_VFPv3 >> +/* Guest accessed VFP/SIMD registers, save host, restore Guest */ >> +cmp r0, #HSR_EC_CP_0_13 >> +beq switch_to_guest_fpsimd >> +#endif >> + >> load_vcpu @ Load VCPU pointer to r0 >> str r1, [vcpu, #VCPU_HSR] >> >> @@ -478,7 +480,7 @@ guest_trap: >> * inject an undefined exception to the guest. >> */ >> #ifdef CONFIG_VFPv3 >> -switch_to_guest_vfp: >> +switch_to_guest_fpsimd: > > Ah, I think I managed to confuse you in my previous comment. > On ARMv7, we call the floating point stuff VFP. > On ARMv8, we call it FP/SIMD. Ah I see, I'll update. > > Not very consistent, I know... > >> load_vcpu @ Load VCPU pointer to r0 How about move it here - then it does not stick out like before. guest_trap: load_vcpu @ Load VCPU pointer to r0 str r1, [vcpu, #VCPU_HSR] @ Check if we need the fault information lsr r1, r1, #HSR_EC_SHIFT #ifdef CONFIG_VFPv3 /* Guest accessed VFP/SIMD registers, save host, restore Guest */ cmp r1, #HSR_EC_CP_0_13 beq switch_to_guest_vfp #endif Regarding "host_switch_to_hyp:" it has no reference but appears like a clean separator, that's on purpose? Thanks > > It would be interesting to find out if we can make this load_vcpu part > of the common sequence (without spilling another register, of course). > Probably involves moving the exception class to r2. > > Thanks, > > M. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] i386: Introduce ARAT CPU feature
On Sun, Jun 07, 2015 at 11:15:08AM +0200, Jan Kiszka wrote: > From: Jan Kiszka > > ARAT signals that the APIC timer does not stop in power saving states. > As our APICs are emulated, it's fine to expose this feature to guests, > at least when asking for KVM host features or with CPU types that > include the flag. The exact model number that introduced the feature is > not known, but reports can be found that it's at least available since > Sandy Bridge. > > Signed-off-by: Jan Kiszka The code looks good now, but: what are the real consequences of enabling/disabling the flag? What exactly guests use it for? Isn't this going to make guests have additional expectations about the APIC timer that may be broken when live-migrating or pausing the VM? -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
[Adding Joerg since he was part of this original idea] On Thu, 2015-06-18 at 09:16 +, Wu, Feng wrote: > > > > -Original Message- > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Tuesday, June 16, 2015 12:45 AM > > To: Eric Auger > > Cc: Avi Kivity; Wu, Feng; kvm@vger.kernel.org; linux-ker...@vger.kernel.org; > > pbonz...@redhat.com; mtosa...@redhat.com > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > > > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote: > > > Hi Alex, all, > > > On 06/12/2015 09:03 PM, Alex Williamson wrote: > > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote: > > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote: > > > >>> On Fri, 2015-06-12 at 00:23 +, Wu, Feng wrote: > > > > -Original Message- > > > > From: Avi Kivity [mailto:avi.kiv...@gmail.com] > > > > Sent: Friday, June 12, 2015 3:59 AM > > > > To: Wu, Feng; kvm@vger.kernel.org; linux-ker...@vger.kernel.org > > > > Cc: pbonz...@redhat.com; mtosa...@redhat.com; > > > > alex.william...@redhat.com; eric.au...@linaro.org > > > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > > > > > > > On 06/11/2015 01:51 PM, Feng Wu wrote: > > > >> From: Eric Auger > > > >> > > > >> This patch adds and documents a new KVM_DEV_VFIO_DEVICE > > group > > > >> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, > > > >> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be > > able > > > >> to set a VFIO device IRQ as forwarded or not forwarded. > > > >> the command takes as argument a handle to a new struct named > > > >> kvm_vfio_dev_irq. > > > > Is there no way to do this automatically? After all, vfio knows > > > > that a > > > > device interrupt is forwarded to some eventfd, and kvm knows that > > some > > > > eventfd is forwarded to a guest interrupt. If they compare notes > > > > through a central registry, they can figure out that the interrupt > > > > needs > > > > to be forwarded. > > > Oh, just like Eric mentioned in his reply, this description is out > > > of context > > of > > > this series, I will remove them in the next version. > > > >>> > > > >>> I suspect Avi's question was more general. While forward/unforward is > > > >>> out of context for this series, it's very similar in nature to > > > >>> enabling/disabling posted interrupts. So I think the question remains > > > >>> whether we really need userspace to participate in creating this > > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out > > > >>> automatically. > > > >>> > > > >>> Personally I don't know how we could do it automatically. We've > > > >>> always > > > >>> relied on userspace to independently setup vfio and kvm such that > > > >>> neither have any idea that the other is there and update each side > > > >>> independently when anything changes. So it seems consistent to > > continue > > > >>> that here. It doesn't seem like there's much to gain performance-wise > > > >>> either, updates should be a relatively rare event I'd expect. > > > >>> > > > >>> There's really no metadata associated with an eventfd, so "comparing > > > >>> notes" automatically might imply some central registration entity. > > > >>> That > > > >>> immediately sounds like a much more complex solution, but maybe Avi > > has > > > >>> some ideas to manage it. Thanks, > > > >>> > > > >> > > > >> The idea is to have a central registry maintained by a posted > > > >> interrupts > > > >> manager. Both vfio and kvm pass the filp (along with extra > > > >> information) > > > >> to the posted interrupts manager, which, when it detects a filp match, > > > >> tells each of them what to do. > > > >> > > > >> The advantages are: > > > >> - old userspace gains the optimization without change > > > >> - a userspace API is more expensive to maintain than internal kernel > > > >> interfaces (CVEs, documentation, maintaining backwards compatibility) > > > >> - if you can do it without a new interface, this indicates that all the > > > >> information in the new interface is redundant. That means you have to > > > >> check it for consistency with the existing information, so it's extra > > > >> work (likely, it's exactly what the posted interrupt manager would be > > > >> doing anyway). > > > > > > > > Yep, those all sound like good things and I believe that's similar in > > > > design to the way we had originally discussed this interaction at > > > > LPC/KVM Forum several years ago. I'd be in favor of that approach. > > > > > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO > > > IRQ forward control" series? Or is that "central registry maintained by > > > a posted interrupts manager" something more specific to x86? > > > > I'd think we'd want it for any sort of offload and supporting both > > posted-interrupts and irq-forwa
Re: [PATCH 13/13] KVM: arm64: enable ITS emulation as a virtual MSI controller
On 06/18/2015 04:03 PM, Pavel Fedin wrote: > Hello! > >> But that fails compilation on ARM (which uses this file as well), >> because we have a dummy fail function in the header if >> CONFIG_HAVE_KVM_MSI is not defined. > > May be then remove that fail function too? Too many #ifdef's are not good... Yes, that seems to work - now. I think I had more code in there before that prevented exposure without #ifdef guarding. Cheers, Andre. > > Kind regards, > Pavel Fedin > Expert Engineer > Samsung Electronics Research center Russia > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 4/6] KVM: arm/arm64: enable irqchip routing
On 18/06/15 19:00, Eric Auger wrote: > Hi Marc, > On 06/18/2015 07:53 PM, Marc Zyngier wrote: >> Hi Eric, >> >> On 18/06/15 18:40, Eric Auger wrote: >>> This patch adds compilation and link against irqchip. >>> >>> On ARM, irqchip routing is not really useful since there is >>> a single irqchip. However main motivation behind using irqchip >>> code is to enable MSI routing code. With the support of in-kernel >>> GICv3 ITS emulation, it now seems to be a MUST HAVE requirement. >>> >>> Functions previously implemented in vgic.c and substitute >>> to more complex irqchip implementation are removed: >>> >>> - kvm_send_userspace_msi >>> - kvm_irq_map_chip_pin >>> - kvm_set_irq >>> - kvm_irq_map_gsi. >>> >>> They implemented a kernel default identity GSI routing. This is now >>> replaced by user-side provided routing. >>> >>> Routing standard hooks are now implemented in vgic: >>> - kvm_set_routing_entry >>> - kvm_set_irq >>> - kvm_set_msi >>> >>> Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. >>> KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. >>> >>> MSI routing is not yet allowed. >>> >>> Signed-off-by: Eric Auger >>> --- >>> Documentation/virtual/kvm/api.txt | 11 -- >>> arch/arm/include/asm/kvm_host.h | 2 + >>> arch/arm/kvm/Kconfig | 2 + >>> arch/arm/kvm/Makefile | 2 +- >>> arch/arm64/include/asm/kvm_host.h | 1 + >>> arch/arm64/kvm/Kconfig| 2 + >>> arch/arm64/kvm/Makefile | 2 +- >>> include/kvm/arm_vgic.h| 9 - >>> virt/kvm/arm/vgic.c | 78 >>> --- >>> virt/kvm/irqchip.c| 2 + >>> 10 files changed, 67 insertions(+), 44 deletions(-) >>> >>> diff --git a/Documentation/virtual/kvm/api.txt >>> b/Documentation/virtual/kvm/api.txt >>> index bcec91e..2bc96e1 100644 >>> --- a/Documentation/virtual/kvm/api.txt >>> +++ b/Documentation/virtual/kvm/api.txt >>> @@ -1395,7 +1395,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or >>> guest IRQ is allowed. >>> 4.52 KVM_SET_GSI_ROUTING >>> >>> Capability: KVM_CAP_IRQ_ROUTING >>> -Architectures: x86 s390 >>> +Architectures: x86 s390 arm arm64 >>> Type: vm ioctl >>> Parameters: struct kvm_irq_routing (in) >>> Returns: 0 on success, -1 on error >>> @@ -2310,9 +2310,12 @@ Note that closing the resamplefd is not sufficient >>> to disable the >>> irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment >>> and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. >>> >>> -On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared >>> -Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is >>> -given by gsi + 32. >>> +On ARM/ARM64, when GSI routing is not used, the gsi field in the >>> +kvm_irqfd struct specifies the Shared Peripheral Interrupt (SPI) index, >>> +such that the GIC interrupt ID is given by gsi + 32. When GSI routing is >>> +setup: >>> +- if irqchip routing: irqchip.pin + 32 is the SPI ID that is injected >>> +- if MSI routing: the MSI data is used as interrupt ID (SPI or LPI). >> >> This feels just wrong. With GICv3, the MSI data is not the LPI at all. >> It is an opaque value that gets translated into an LPI when combined >> with the DeviceID. > I agree with you. I need to rephrase that. In practice this is what > should happen in the code since I use Andre's MSI injection routine > which does the translation; except for GICv2 where last patch attempts > to do direct gsi mapping from msi msg data! Agreed. The code seems to do the right thing, only the documentation is misleading. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] MAINTAINERS: Add vfio-platform sub-maintainer
Add Baptiste Reynal as the VFIO platform driver sub-maintainer. Signed-off-by: Alex Williamson Cc: Baptiste Reynal --- MAINTAINERS |6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index d8afd29..c6bf7f6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10545,6 +10545,12 @@ F: drivers/vfio/ F: include/linux/vfio.h F: include/uapi/linux/vfio.h +VFIO PLATFORM DRIVER +M: Baptiste Reynal +L: kvm@vger.kernel.org +S: Maintained +F: drivers/vfio/platform/ + VIDEOBUF2 FRAMEWORK M: Pawel Osciak M: Marek Szyprowski -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 4/6] KVM: arm/arm64: enable irqchip routing
Hi Marc, On 06/18/2015 07:53 PM, Marc Zyngier wrote: > Hi Eric, > > On 18/06/15 18:40, Eric Auger wrote: >> This patch adds compilation and link against irqchip. >> >> On ARM, irqchip routing is not really useful since there is >> a single irqchip. However main motivation behind using irqchip >> code is to enable MSI routing code. With the support of in-kernel >> GICv3 ITS emulation, it now seems to be a MUST HAVE requirement. >> >> Functions previously implemented in vgic.c and substitute >> to more complex irqchip implementation are removed: >> >> - kvm_send_userspace_msi >> - kvm_irq_map_chip_pin >> - kvm_set_irq >> - kvm_irq_map_gsi. >> >> They implemented a kernel default identity GSI routing. This is now >> replaced by user-side provided routing. >> >> Routing standard hooks are now implemented in vgic: >> - kvm_set_routing_entry >> - kvm_set_irq >> - kvm_set_msi >> >> Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. >> KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. >> >> MSI routing is not yet allowed. >> >> Signed-off-by: Eric Auger >> --- >> Documentation/virtual/kvm/api.txt | 11 -- >> arch/arm/include/asm/kvm_host.h | 2 + >> arch/arm/kvm/Kconfig | 2 + >> arch/arm/kvm/Makefile | 2 +- >> arch/arm64/include/asm/kvm_host.h | 1 + >> arch/arm64/kvm/Kconfig| 2 + >> arch/arm64/kvm/Makefile | 2 +- >> include/kvm/arm_vgic.h| 9 - >> virt/kvm/arm/vgic.c | 78 >> --- >> virt/kvm/irqchip.c| 2 + >> 10 files changed, 67 insertions(+), 44 deletions(-) >> >> diff --git a/Documentation/virtual/kvm/api.txt >> b/Documentation/virtual/kvm/api.txt >> index bcec91e..2bc96e1 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -1395,7 +1395,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or >> guest IRQ is allowed. >> 4.52 KVM_SET_GSI_ROUTING >> >> Capability: KVM_CAP_IRQ_ROUTING >> -Architectures: x86 s390 >> +Architectures: x86 s390 arm arm64 >> Type: vm ioctl >> Parameters: struct kvm_irq_routing (in) >> Returns: 0 on success, -1 on error >> @@ -2310,9 +2310,12 @@ Note that closing the resamplefd is not sufficient to >> disable the >> irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment >> and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. >> >> -On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared >> -Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is >> -given by gsi + 32. >> +On ARM/ARM64, when GSI routing is not used, the gsi field in the >> +kvm_irqfd struct specifies the Shared Peripheral Interrupt (SPI) index, >> +such that the GIC interrupt ID is given by gsi + 32. When GSI routing is >> +setup: >> +- if irqchip routing: irqchip.pin + 32 is the SPI ID that is injected >> +- if MSI routing: the MSI data is used as interrupt ID (SPI or LPI). > > This feels just wrong. With GICv3, the MSI data is not the LPI at all. > It is an opaque value that gets translated into an LPI when combined > with the DeviceID. I agree with you. I need to rephrase that. In practice this is what should happen in the code since I use Andre's MSI injection routine which does the translation; except for GICv2 where last patch attempts to do direct gsi mapping from msi msg data! Thanks Eric > > Thanks, > > M. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 4/6] KVM: arm/arm64: enable irqchip routing
Hi Eric, On 18/06/15 18:40, Eric Auger wrote: > This patch adds compilation and link against irqchip. > > On ARM, irqchip routing is not really useful since there is > a single irqchip. However main motivation behind using irqchip > code is to enable MSI routing code. With the support of in-kernel > GICv3 ITS emulation, it now seems to be a MUST HAVE requirement. > > Functions previously implemented in vgic.c and substitute > to more complex irqchip implementation are removed: > > - kvm_send_userspace_msi > - kvm_irq_map_chip_pin > - kvm_set_irq > - kvm_irq_map_gsi. > > They implemented a kernel default identity GSI routing. This is now > replaced by user-side provided routing. > > Routing standard hooks are now implemented in vgic: > - kvm_set_routing_entry > - kvm_set_irq > - kvm_set_msi > > Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. > KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. > > MSI routing is not yet allowed. > > Signed-off-by: Eric Auger > --- > Documentation/virtual/kvm/api.txt | 11 -- > arch/arm/include/asm/kvm_host.h | 2 + > arch/arm/kvm/Kconfig | 2 + > arch/arm/kvm/Makefile | 2 +- > arch/arm64/include/asm/kvm_host.h | 1 + > arch/arm64/kvm/Kconfig| 2 + > arch/arm64/kvm/Makefile | 2 +- > include/kvm/arm_vgic.h| 9 - > virt/kvm/arm/vgic.c | 78 > --- > virt/kvm/irqchip.c| 2 + > 10 files changed, 67 insertions(+), 44 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index bcec91e..2bc96e1 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1395,7 +1395,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or > guest IRQ is allowed. > 4.52 KVM_SET_GSI_ROUTING > > Capability: KVM_CAP_IRQ_ROUTING > -Architectures: x86 s390 > +Architectures: x86 s390 arm arm64 > Type: vm ioctl > Parameters: struct kvm_irq_routing (in) > Returns: 0 on success, -1 on error > @@ -2310,9 +2310,12 @@ Note that closing the resamplefd is not sufficient to > disable the > irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment > and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. > > -On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared > -Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is > -given by gsi + 32. > +On ARM/ARM64, when GSI routing is not used, the gsi field in the > +kvm_irqfd struct specifies the Shared Peripheral Interrupt (SPI) index, > +such that the GIC interrupt ID is given by gsi + 32. When GSI routing is > +setup: > +- if irqchip routing: irqchip.pin + 32 is the SPI ID that is injected > +- if MSI routing: the MSI data is used as interrupt ID (SPI or LPI). This feels just wrong. With GICv3, the MSI data is not the LPI at all. It is an opaque value that gets translated into an LPI when combined with the DeviceID. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
On 06/18/2015 10:37 AM, Marc Zyngier wrote: > On 17/06/15 16:50, Eric Auger wrote: >> On 06/17/2015 05:37 PM, Marc Zyngier wrote: >>> On 17/06/15 16:11, Eric Auger wrote: Hi Marc, On 06/08/2015 07:04 PM, Marc Zyngier wrote: > So far, the only use of the HW interrupt facility is the timer, > implying that the active state is context-switched for each vcpu, > as the device is is shared across all vcpus. s/is// > > This does not work for a device that has been assigned to a VM, > as the guest is entierely in control of that device (the HW is entirely? > not shared). In that case, it makes sense to bypass the whole > active state srtwitchint, and only track the deactivation of the switching >>> >>> Congratulations, I think you're now ready to try deciphering my >>> handwriting... ;-) >> good to see you're not a machine or maybe you do it on purpose some >> times ;-) >>> > interrupt. > > Signed-off-by: Marc Zyngier > --- > include/kvm/arm_vgic.h| 5 +++-- > virt/kvm/arm/arch_timer.c | 2 +- > virt/kvm/arm/vgic.c | 37 - > 3 files changed, 28 insertions(+), 16 deletions(-) > > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h > index 1c653c1..5d47d60 100644 > --- a/include/kvm/arm_vgic.h > +++ b/include/kvm/arm_vgic.h > @@ -164,7 +164,8 @@ struct irq_phys_map { > u32 virt_irq; > u32 phys_irq; > u32 irq; > - boolactive; > + boolshared; > + boolactive; /* Only valid if shared */ > }; > > struct vgic_dist { > @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 > reg); > int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); > int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu); > struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu, > -int virt_irq, int irq); > +int virt_irq, int irq, bool shared); > int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map); > bool vgic_get_phys_irq_active(struct irq_phys_map *map); > void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active); > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c > index b9fff78..9544d79 100644 > --- a/virt/kvm/arm/arch_timer.c > +++ b/virt/kvm/arm/arch_timer.c > @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu, >* Tell the VGIC that the virtual interrupt is tied to a >* physical interrupt. We do that once per VCPU. >*/ > - timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq); > + timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true); > WARN_ON(!timer->map); > } > > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c > index f376b56..4223166 100644 > --- a/virt/kvm/arm/vgic.c > +++ b/virt/kvm/arm/vgic.c > @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu > *vcpu, int irq, > map = vgic_irq_map_search(vcpu, irq); > > if (map) { > - int ret; > - > - BUG_ON(!map->active); > vlr.hwirq = map->phys_irq; > vlr.state |= LR_HW; > vlr.state &= ~LR_EOI_INT; > > - ret = irq_set_irqchip_state(map->irq, > - IRQCHIP_STATE_ACTIVE, > - true); > vgic_irq_set_queued(vcpu, irq); the queued state is set again in vgic_queue_hwirq for level_sensitive IRQs although not harmful. >>> >>> Indeed. We still need it for edge interrupts though. I'll try to find a >>> nicer way... >>> > - WARN_ON(ret); > + > + if (map->shared) { > + int ret; > + > + BUG_ON(!map->active); > + ret = irq_set_irqchip_state(map->irq, > + > IRQCHIP_STATE_ACTIVE, > + true); > + WARN_ON(ret); > + } > } > } > > @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct > kvm_vcpu *vcpu) > static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr) > { > struct irq_phys_map *map; > + bool active; > int ret; > > if (!(vlr.state & LR_HW)) > return 0; > > map = vgic_irq_map_search(vcpu, vlr.irq); > - BUG_ON(!map || !map->active);
[RFC 1/6] KVM: api: add kvm_irq_routing_extended_msi
On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that composes the MSI msg. Let's create a new routing entry structure that enables to encode that information on top of standard MSI message Signed-off-by: Eric Auger --- Documentation/virtual/kvm/api.txt | 9 + include/uapi/linux/kvm.h | 9 + 2 files changed, 18 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d20fd94..bcec91e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1419,6 +1419,7 @@ struct kvm_irq_routing_entry { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; struct kvm_irq_routing_s390_adapter adapter; + struct kvm_irq_routing_extended_msi ext_msi; __u32 pad[8]; } u; }; @@ -1427,6 +1428,7 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 No flags are specified so far, the corresponding field must be set to zero. @@ -1442,6 +1444,13 @@ struct kvm_irq_routing_msi { __u32 pad; }; +struct kvm_irq_routing_extended_msi { + __u32 address_lo; + __u32 address_hi; + __u32 data; + __u32 devid; +}; + struct kvm_irq_routing_s390_adapter { __u64 ind_addr; __u64 summary_addr; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2a23705..e3f65a0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -829,6 +829,13 @@ struct kvm_irq_routing_msi { __u32 pad; }; +struct kvm_irq_routing_extended_msi { + __u32 address_lo; + __u32 address_hi; + __u32 data; + __u32 devid; +}; + struct kvm_irq_routing_s390_adapter { __u64 ind_addr; __u64 summary_addr; @@ -841,6 +848,7 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 struct kvm_irq_routing_entry { __u32 gsi; @@ -851,6 +859,7 @@ struct kvm_irq_routing_entry { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; struct kvm_irq_routing_s390_adapter adapter; + struct kvm_irq_routing_extended_msi ext_msi; __u32 pad[8]; } u; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/6] KVM: arm/arm64: gsi routing support
With the advent of GICv3 ITS in-kernel emulation, KVM GSI routing appears to be requested. More specifically MSI routing is needed. irqchip routing does not sound to be really useful on arm but usage of MSI routing also mandates to integrate irqchip routing. The initial implementation of irqfd on arm must be upgraded with the integration of kvm irqchip.c code and the implementation of its standard hooks in the architecture specific part. The series therefore allows and mandates the usage of KVM_SET_GSI_ROUTING ioctl along with KVM_IRQFD. If the userspace does not define any routing table, no irqfd injection can happen. The user-space can use KVM_CAP_IRQ_ROUTING to detect whether a routing table is needed. for irqchip routing, the convention is, only SPI can be injected and the SPI ID corresponds to irqchip.pin + 32. For MSI routing the interrupt ID matches the MSI msg data. API evolve to support associating a device ID to a routine entry. Known Issues of this RFC: - One of the biggest is the API inconsistencies on ARM. Blame me. Routing should apply to KVM_IRQ_LINE ioctl which is not the case yet in this series. It only applies to irqfd. on x86 typically this KVM_IRQ_LINE is plugged onto irqchip.c kvm_set_irq whereas on ARM we inject directly through kvm_vgic_inject_irq x on arm/arm64 gsi has a specific structure: bits: | 31 ... 24 | 23 ... 16 | 15...0 | field: | irq_type | vcpu_index | irq_id | where irq_id matches the Interrupt ID - for KVM_IRQFD without routing (current implementation) the gsi field corresponds to an SPI index = irq_id (above) -32. - as far as understand qemu integration, gsi is supposed to be within [0, KVM_MAX_IRQ_ROUTES]. Difficult to use KVM_IRQ_LINE gsi. - to be defined what we choose as a convention with irqchip routing is applied: gsi -> irqchip input pin. - Or shouldn't we simply rule out any userspace irqchip routing and stick to MSI routing? we could define a fixed identity in-kernel irqchip mapping and only offer MSI routing. - static allocation of chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS]; arbitrary put KVM_IRQCHIP_NUM_PINS = 1020 - 32 (SPI count). On s390 this is even bigger. Currently tested on irqchip routing only (Calxeda midway only), ie NOT TESTED on MSI routing yet. This is a very preliminary RFC to ease the discussion. Code can be found at https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.1-rc8-gsi-routing-rfc It applies on Andre's [PATCH 00/13] arm64: KVM: GICv3 ITS emulation (http://www.spinics.net/lists/kvm/msg117402.html) Eric Auger (6): KVM: api: add kvm_irq_routing_extended_msi KVM: kvm_host: add kvm_extended_msi KVM: irqchip: convey devid to kvm_set_msi KVM: arm/arm64: enable irqchip routing KVM: arm/arm64: enable MSI routing KVM: arm: implement kvm_set_msi by gsi direct mapping Documentation/virtual/kvm/api.txt | 20 ++-- arch/arm/include/asm/kvm_host.h | 2 + arch/arm/kvm/Kconfig | 3 ++ arch/arm/kvm/Makefile | 2 +- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/Kconfig| 2 + arch/arm64/kvm/Makefile | 2 +- include/kvm/arm_vgic.h| 9 include/linux/kvm_host.h | 10 include/uapi/linux/kvm.h | 9 virt/kvm/arm/vgic.c | 96 +++ virt/kvm/irqchip.c| 20 ++-- 12 files changed, 128 insertions(+), 48 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 5/6] KVM: arm/arm64: enable MSI routing
Up to now, only irqchip routing entries could be set. This patch adds the capability to insert MSI routing entries, extended or standard ones. Although standard MSI entries can be set, their injection still is not supported. For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096. Signed-off-by: Eric Auger --- include/linux/kvm_host.h | 2 ++ virt/kvm/arm/vgic.c | 13 + 2 files changed, 15 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e1c1c0d..6cacf11 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -927,6 +927,8 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) #ifdef CONFIG_S390 #define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that... +#elif defined(CONFIG_ARM64) +#define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that too... #else #define KVM_MAX_IRQ_ROUTES 1024 #endif diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 212a5ff..16d232f 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2256,6 +2256,19 @@ int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e, (e->irqchip.irqchip >= KVM_NR_IRQCHIPS)) goto out; break; + case KVM_IRQ_ROUTING_MSI: + e->set = kvm_set_msi; + e->msi.address_lo = ue->u.msi.address_lo; + e->msi.address_hi = ue->u.msi.address_hi; + e->msi.data = ue->u.msi.data; + break; + case KVM_IRQ_ROUTING_EXTENDED_MSI: + e->set = kvm_set_msi; + e->ext_msi.address_lo = ue->u.ext_msi.address_lo; + e->ext_msi.address_hi = ue->u.ext_msi.address_hi; + e->ext_msi.data = ue->u.ext_msi.data; + e->ext_msi.devid = ue->u.ext_msi.devid; + break; default: goto out; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 3/6] KVM: irqchip: convey devid to kvm_set_msi
on ARM, a devid field is conveyed in kvm_msi struct. Let's choose the rooting type and struct according to its availability and fill the corresponding struct. Also remove the flag check now this latter can be non null. Signed-off-by: Eric Auger --- virt/kvm/irqchip.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c index 1d56a90..e76c7d2 100644 --- a/virt/kvm/irqchip.c +++ b/virt/kvm/irqchip.c @@ -73,12 +73,22 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) { struct kvm_kernel_irq_routing_entry route; - if (!irqchip_in_kernel(kvm) || msi->flags != 0) + if (!irqchip_in_kernel(kvm)) return -EINVAL; - route.msi.address_lo = msi->address_lo; - route.msi.address_hi = msi->address_hi; - route.msi.data = msi->data; + if (msi->flags & KVM_MSI_VALID_DEVID) { + route.type = KVM_IRQ_ROUTING_EXTENDED_MSI; + route.ext_msi.address_lo = msi->address_lo; + route.ext_msi.address_hi = msi->address_hi; + route.ext_msi.data = msi->data; + route.ext_msi.devid= msi->devid; + } + else { + route.type = KVM_IRQ_ROUTING_MSI; + route.msi.address_lo = msi->address_lo; + route.msi.address_hi = msi->address_hi; + route.msi.data = msi->data; + } return kvm_set_msi(&route, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 6/6] KVM: arm: implement kvm_set_msi by gsi direct mapping
If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID. This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too. Signed-off-by: Eric Auger --- arch/arm/kvm/Kconfig | 1 + virt/kvm/arm/vgic.c | 5 + 2 files changed, 6 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 16d232f..40e96f9 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2293,6 +2293,11 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, return kvm->arch.vgic.vm_ops.inject_msi(kvm, &msi); else return -ENODEV; + case KVM_IRQ_ROUTING_MSI: + if (kvm->arch.vgic.vm_ops.inject_msi) + return -EINVAL; + else + return kvm_vgic_inject_irq(kvm, 0, e->msi.data, level); default: return -EINVAL; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 4/6] KVM: arm/arm64: enable irqchip routing
This patch adds compilation and link against irqchip. On ARM, irqchip routing is not really useful since there is a single irqchip. However main motivation behind using irqchip code is to enable MSI routing code. With the support of in-kernel GICv3 ITS emulation, it now seems to be a MUST HAVE requirement. Functions previously implemented in vgic.c and substitute to more complex irqchip implementation are removed: - kvm_send_userspace_msi - kvm_irq_map_chip_pin - kvm_set_irq - kvm_irq_map_gsi. They implemented a kernel default identity GSI routing. This is now replaced by user-side provided routing. Routing standard hooks are now implemented in vgic: - kvm_set_routing_entry - kvm_set_irq - kvm_set_msi Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. MSI routing is not yet allowed. Signed-off-by: Eric Auger --- Documentation/virtual/kvm/api.txt | 11 -- arch/arm/include/asm/kvm_host.h | 2 + arch/arm/kvm/Kconfig | 2 + arch/arm/kvm/Makefile | 2 +- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/Kconfig| 2 + arch/arm64/kvm/Makefile | 2 +- include/kvm/arm_vgic.h| 9 - virt/kvm/arm/vgic.c | 78 --- virt/kvm/irqchip.c| 2 + 10 files changed, 67 insertions(+), 44 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bcec91e..2bc96e1 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1395,7 +1395,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed. 4.52 KVM_SET_GSI_ROUTING Capability: KVM_CAP_IRQ_ROUTING -Architectures: x86 s390 +Architectures: x86 s390 arm arm64 Type: vm ioctl Parameters: struct kvm_irq_routing (in) Returns: 0 on success, -1 on error @@ -2310,9 +2310,12 @@ Note that closing the resamplefd is not sufficient to disable the irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. -On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared -Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is -given by gsi + 32. +On ARM/ARM64, when GSI routing is not used, the gsi field in the +kvm_irqfd struct specifies the Shared Peripheral Interrupt (SPI) index, +such that the GIC interrupt ID is given by gsi + 32. When GSI routing is +setup: +- if irqchip routing: irqchip.pin + 32 is the SPI ID that is injected +- if MSI routing: the MSI data is used as interrupt ID (SPI or LPI). 4.76 KVM_PPC_ALLOCATE_HTAB diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index d71607c..452697e 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -42,6 +42,8 @@ #define KVM_VCPU_MAX_FEATURES 2 +#define KVM_IRQCHIP_NUM_PINS 988 /* 1020 -32 is the number of SPI */ + #include u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..151e710 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,8 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_IRQCHIP + select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index c5eef02c..1a8f48a 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o $(KVM)/irqchip.o obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f0f58c9..751210a 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -44,6 +44,7 @@ #include #define KVM_VCPU_MAX_FEATURES 3 +#define KVM_IRQCHIP_NUM_PINS 988 /* 1020 -32 is the number of SPI */ int __attribute_const__ kvm_target_cpu(void); int kvm_reset_vcpu(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index ff9722f..1a9900d 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -32,6 +32,8 @@ config KVM select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD select HAVE_KVM_MSI + select HAVE_KVM_IRQCHIP + select HAVE_KVM_IRQ_ROUTING ---help--- Support hosting virtualized guest machines. diff --git a/
[RFC 2/6] KVM: kvm_host: add kvm_extended_msi
As a follow-up of user API extension let's create a corresponding kernel side structure Signed-off-by: Eric Auger --- include/linux/kvm_host.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ad45054..e1c1c0d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -304,6 +304,13 @@ struct kvm_s390_adapter_int { u32 adapter_id; }; +struct kvm_extended_msi { + u32 address_lo; /* low 32 bits of msi message address */ + u32 address_hi; /* high 32 bits of msi message address */ + u32 data; /* 16 bits of msi message data */ + u32 devid; /* out-of-band device ID */ +}; + struct kvm_kernel_irq_routing_entry { u32 gsi; u32 type; @@ -317,6 +324,7 @@ struct kvm_kernel_irq_routing_entry { } irqchip; struct msi_msg msi; struct kvm_s390_adapter_int adapter; + struct kvm_extended_msi ext_msi; }; struct hlist_node link; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64
On 16/06/15 22:50, Mario Smarduch wrote: > After enhancing arm64 FP/SIMD exit handling, FP/SIMD exit branch is moved > to guest trap handling. This keeps exiting handling flow between both > architectures consistent. > > Signed-off-by: Mario Smarduch > --- > arch/arm/kvm/interrupts.S | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S > index 79caf79..fca2c56 100644 > --- a/arch/arm/kvm/interrupts.S > +++ b/arch/arm/kvm/interrupts.S > @@ -363,10 +363,6 @@ hyp_hvc: > @ Check syndrome register > mrc p15, 4, r1, c5, c2, 0 @ HSR > lsr r0, r1, #HSR_EC_SHIFT > -#ifdef CONFIG_VFPv3 > - cmp r0, #HSR_EC_CP_0_13 > - beq switch_to_guest_vfp > -#endif > cmp r0, #HSR_EC_HVC > bne guest_trap @ Not HVC instr. > > @@ -406,6 +402,12 @@ THUMB( orr lr, #1) > 1: eret > > guest_trap: > +#ifdef CONFIG_VFPv3 > + /* Guest accessed VFP/SIMD registers, save host, restore Guest */ > + cmp r0, #HSR_EC_CP_0_13 > + beq switch_to_guest_fpsimd > +#endif > + > load_vcpu @ Load VCPU pointer to r0 > str r1, [vcpu, #VCPU_HSR] > > @@ -478,7 +480,7 @@ guest_trap: > * inject an undefined exception to the guest. > */ > #ifdef CONFIG_VFPv3 > -switch_to_guest_vfp: > +switch_to_guest_fpsimd: Ah, I think I managed to confuse you in my previous comment. On ARMv7, we call the floating point stuff VFP. On ARMv8, we call it FP/SIMD. Not very consistent, I know... > load_vcpu @ Load VCPU pointer to r0 It would be interesting to find out if we can make this load_vcpu part of the common sequence (without spilling another register, of course). Probably involves moving the exception class to r2. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM slow LAMP guest
On Thu, Jun 18, 2015 at 1:25 AM, Hansa wrote: > Hi, > > I have a LAMP server as guest in KVM. Whenever the server is idle for some > time it takes about 30 seconds to load a Wordpress site. > If the server is not idle the site shows up in max 5 seconds. I've already > turned of power management in the guest by passing > > GRUB_CMDLINE_LINUX_DEFAULT="apm=off" > > in /etc/default/grub. This has no effect. > Does KVM do some power management on guests? If so, how do I turn this off > for my LAMP guest? KVM doesn't do any power management of guests. But if everything is idle on the host (including your guest), then host power management could kick in. Have you tried playing with host pm? Could you try running your workload with the guest kernel parameter "idle=poll" and let me know the performance? Also, if you are running Linux 4.0 or later on the host, could you try running your workload with the KVM module parameter "halt_poll_ns=50"? > > Best, Hansa > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't use PCI config space IRQ line field
Hi Will, On 06/16/2015 06:06 PM, Will Deacon wrote: > On Mon, Jun 15, 2015 at 11:45:38AM +0100, Andre Przywara wrote: >> On 06/05/2015 05:41 PM, Will Deacon wrote: >>> On Thu, Jun 04, 2015 at 04:20:45PM +0100, Andre Przywara wrote: In PCI config space there is an interrupt line field (offset 0x3f), which is used to initially communicate the IRQ line number from firmware to the OS. _Hardware_ should never use this information, as the OS is free to write any information in there. But kvmtool uses this number when it triggers IRQs in the guest, which fails starting with Linux 3.19-rc1, where the PCI layer starts writing the virtual IRQ number in there. Fix that by storing the IRQ number in a separate field in struct virtio_pci, which is independent from the PCI config space and cannot be influenced by the guest. This fixes ARM/ARM64 guests using PCI with newer kernels. Signed-off-by: Andre Przywara --- include/kvm/virtio-pci.h | 8 virtio/pci.c | 9 ++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h index c795ce7..b70cadd 100644 --- a/include/kvm/virtio-pci.h +++ b/include/kvm/virtio-pci.h @@ -30,6 +30,14 @@ struct virtio_pci { u8 isr; u32 features; + /* + * We cannot rely on the INTERRUPT_LINE byte in the config space once + * we have run guest code, as the OS is allowed to use that field + * as a scratch pad to communicate between driver and PCI layer. + * So store our legacy interrupt line number in here for internal use. + */ + u8 legacy_irq_line; + /* MSI-X */ u16 config_vector; u32 config_gsi; diff --git a/virtio/pci.c b/virtio/pci.c index 7556239..e17e5a9 100644 --- a/virtio/pci.c +++ b/virtio/pci.c @@ -141,7 +141,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p break; case VIRTIO_PCI_ISR: ioport__write8(data, vpci->isr); - kvm__irq_line(kvm, vpci->pci_hdr.irq_line, VIRTIO_IRQ_LOW); + kvm__irq_line(kvm, vpci->legacy_irq_line, VIRTIO_IRQ_LOW); vpci->isr = VIRTIO_IRQ_LOW; break; default: @@ -299,7 +299,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct virtio_device *vdev, u32 vq) kvm__irq_trigger(kvm, vpci->gsis[vq]); } else { vpci->isr = VIRTIO_IRQ_HIGH; - kvm__irq_trigger(kvm, vpci->pci_hdr.irq_line); + kvm__irq_trigger(kvm, vpci->legacy_irq_line); } return 0; } @@ -323,7 +323,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct virtio_device *vdev) kvm__irq_trigger(kvm, vpci->config_gsi); } else { vpci->isr = VIRTIO_PCI_ISR_CONFIG; - kvm__irq_trigger(kvm, vpci->pci_hdr.irq_line); + kvm__irq_trigger(kvm, vpci->legacy_irq_line); } return 0; @@ -422,6 +422,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, if (r < 0) goto free_msix_mmio; + /* save the IRQ that device__register() has allocated */ + vpci->legacy_irq_line = vpci->pci_hdr.irq_line; >>> >>> I'd rather we used the container_of trick that we do for virtio-mmio >>> devices when assigning the irq in device__register. Then we can avoid >>> this line completely. >> >> Not completely sure I get what you mean, I take it you want to assign >> legacy_irq_line in pci__assign_irq() directly (where the IRQ number is >> allocated). >> But this function is PCI generic code and is used by the VESA >> framebuffer and the shmem device on x86 as well. For those devices >> dev_hdr is not part of a struct virtio_pci, so we can't do container_of >> to assign the legacy_irq_line here directly. >> Admittedly this fix should apply to the other two users as well, but >> VESA does not use interrupts and pci-shmem is completely broken anyway, >> so I didn't bother to fix it in this regard. >> Would it be justified to provide an IRQ number field in struct >> device_header to address all users? >> >> Or what am I missing here? > > If VESA and shmem are broken, they should either be fixed or removed. I am tempted to remove shmem, since it's broken: a) there is no upstream driver, only some out-of-tree uio driver module in some Github repo b) the PCI device BARs do not match what QEMU implements and what the uio driver expects (IO BAR vs. MMIO BAR) c) there is (at least one) bug in kvmtool (easily fixed, though) I haven't completely given up yet fixing it, but that's for another
Re: [PATCH v2 1/2] arm64: KVM: Optimize arm64 fp/simd save/restore
On 16/06/15 22:50, Mario Smarduch wrote: > This patch only saves and restores FP/SIMD registers on Guest access. To do > this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit. > lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD > context is not saved/restored > > Signed-off-by: Mario Smarduch Looks nice and clean. Reviewed-by: Marc Zyngier Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On 18/06/2015 16:47, Michael S. Tsirkin wrote: >> However, with Igor's patches a memory_region_del_subregion will cause a >> mmap(MAP_NORESERVE), which _does_ have the effect of making the hva go away. >> >> I guess one way to do it would be to alias the same page in two places, >> one for use by vhost and one for use by everything else. However, the >> kernel does not provide the means to do this kind of aliasing for >> anonymous mmaps. > > Basically pages go away on munmap, so won't simple > lock > munmap > mmap(MAP_NORESERVE) > unlock > do the trick? Not sure I follow. Here we have this: VCPU 1 VCPU 2 I/O worker take big QEMU lock p = address_space_map(hva, len) pass I/O request to worker thread read(fd, p, len) release big QEMU lock memory_region_del_subregion mmap(MAP_NORESERVE) read returns EFAULT wake up VCPU 1 take big QEMU lock EFAULT? What's that? In another scenario you are less lucky: the memory accesses between address_space_map/unmap aren't done in the kernel and you get a plain old SIGSEGV. This is not something that you can fix with a lock. The very purpose of the map/unmap API is to do stuff asynchronously while the lock is released. Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, 18 Jun 2015 16:47:33 +0200 "Michael S. Tsirkin" wrote: > On Thu, Jun 18, 2015 at 03:46:14PM +0200, Paolo Bonzini wrote: > > > > > > On 18/06/2015 15:19, Michael S. Tsirkin wrote: > > > On Thu, Jun 18, 2015 at 01:50:32PM +0200, Paolo Bonzini wrote: > > >> > > >> > > >> On 18/06/2015 13:41, Michael S. Tsirkin wrote: > > >>> On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: > > Lets leave decision upto users instead of making them live with > > crashing guests. > > >>> > > >>> Come on, let's fix it in userspace. > > >> > > >> It's not trivial to fix it in userspace. Since QEMU uses RCU there > > >> isn't a single memory map to use for a linear gpa->hva map. > > > > > > Could you elaborate? > > > > > > I'm confused by this mention of RCU. > > > You use RCU for accesses to the memory map, correct? > > > So memory map itself is a write side operation, as such all you need to > > > do is take some kind of lock to prevent conflicting with other memory > > > maps, do rcu sync under this lock. > > > > You're right, the problem isn't directly related to RCU. RCU would be > > easy to handle by using synchronize_rcu instead of call_rcu. While I > > identified an RCU-related problem with Igor's patches, it's much more > > entrenched. > > > > RAM can be used by asynchronous operations while the VM runs, between > > address_space_map and address_space_unmap. It is possible and common to > > have a quiescent state between the map and unmap, and a memory map > > change can happen in the middle of this. Normally this is not a > > problem, because changes to the memory map do not make the hva go away > > (memory regions are reference counted). > > Right, so you want mmap(MAP_NORESERVE) when that reference > count becomes 0. > > > However, with Igor's patches a memory_region_del_subregion will cause a > > mmap(MAP_NORESERVE), which _does_ have the effect of making the hva go away. > > > > I guess one way to do it would be to alias the same page in two places, > > one for use by vhost and one for use by everything else. However, the > > kernel does not provide the means to do this kind of aliasing for > > anonymous mmaps. > > > > Paolo > > Basically pages go away on munmap, so won't simple > lock > munmap > mmap(MAP_NORESERVE) > unlock > do the trick? at what time are you suggesting to do this? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvmtool: Makefile: allow overriding CC and LD
Currently we set CC unconditionally to ${CROSS_COMPILE}gcc, the same for LD. Allow people to override the compiler name by specifying it explicitly on the command line or via the environment. Beside calling a certain compiler binary this allows to pass in options to the compiler, which lets us get rid of the PowerPC overrides in the Makefile. Possible uses: $ make CC="gcc -m64" LD="ld -melf64ppc" (build kvmtool on a PowerPC toolchain defaulting to 32-bit) $ make CC="gcc -m32" LD="ld -melf_i386" (build a 32-bit binary on a multilib-enabled x86-64 compiler) Signed-off-by: Andre Przywara --- Makefile | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index 6110b8e..888bee5 100644 --- a/Makefile +++ b/Makefile @@ -14,9 +14,13 @@ export E Q include config/utilities.mak include config/feature-tests.mak -CC := $(CROSS_COMPILE)gcc +ifeq ($(origin CC), default) + CC := $(CROSS_COMPILE)gcc +endif CFLAGS := -LD := $(CROSS_COMPILE)ld +ifeq ($(origin LD), default) + LD := $(CROSS_COMPILE)ld +endif LDFLAGS:= FIND := find @@ -148,8 +152,6 @@ ifeq ($(ARCH), powerpc) OBJS+= powerpc/spapr_pci.o OBJS+= powerpc/xics.o ARCH_INCLUDE := powerpc/include - CFLAGS += -m64 - LDFLAGS += -m elf64ppc ARCH_WANT_LIBFDT := y endif -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 13/13] KVM: arm64: enable ITS emulation as a virtual MSI controller
Hello! > But that fails compilation on ARM (which uses this file as well), > because we have a dummy fail function in the header if > CONFIG_HAVE_KVM_MSI is not defined. May be then remove that fail function too? Too many #ifdef's are not good... Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] powerpc: use default endianness for converting guest/init
Hi, On 06/17/2015 10:43 AM, Andre Przywara wrote: > For converting the guest/init binary into an object file, we call > the linker binary, setting the endianness to big endian explicitly > when compiling kvmtool for powerpc. > This breaks if the compiler is actually targetting little endian > (which is true for the Debian port, for instance). > Remove the explicit big endianness switch from the linker call to > allow linking on little endian PowerPC builds again. > > Signed-off-by: Andre Przywara > --- > Hi, > > this fixed the powerpc64le build for me, while still compiling fine > for big endian. Admittedly this whole init->guest_init.o conversion > has its issues (with MIPS, for instance), which deserve proper fixing, > but lets just fix that build for now. > Will was concerned about breaking toolchains where the linker does not default to 64-bit. Is that an issue we care about? AFAICT LDFLAGS is only used in this dodgy binary-to-object-file conversion of guest/init. For this we rely on the resulting .o file to have the same ELF target as the other object files to be finally linked into the lkvm binary. As we don't compile guest/init with CFLAGS, there is a possible mismatch. I am looking into a proper fix for this now (compiling guest/init with CFLAGS, calling $CC with linker options instead of $LD and allowing CC and LD override). Still struggling with MIPS, though :-( If someone is eager to fix compilation on PowerPC meanwhile, feel free to use this fix for the time being. Cheers, Andre. > > Makefile | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/Makefile b/Makefile > index 6110b8e..c118e1a 100644 > --- a/Makefile > +++ b/Makefile > @@ -149,7 +149,6 @@ ifeq ($(ARCH), powerpc) > OBJS+= powerpc/xics.o > ARCH_INCLUDE := powerpc/include > CFLAGS += -m64 > - LDFLAGS += -m elf64ppc > > ARCH_WANT_LIBFDT := y > endif > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, Jun 18, 2015 at 03:46:14PM +0200, Paolo Bonzini wrote: > > > On 18/06/2015 15:19, Michael S. Tsirkin wrote: > > On Thu, Jun 18, 2015 at 01:50:32PM +0200, Paolo Bonzini wrote: > >> > >> > >> On 18/06/2015 13:41, Michael S. Tsirkin wrote: > >>> On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: > Lets leave decision upto users instead of making them live with > crashing guests. > >>> > >>> Come on, let's fix it in userspace. > >> > >> It's not trivial to fix it in userspace. Since QEMU uses RCU there > >> isn't a single memory map to use for a linear gpa->hva map. > > > > Could you elaborate? > > > > I'm confused by this mention of RCU. > > You use RCU for accesses to the memory map, correct? > > So memory map itself is a write side operation, as such all you need to > > do is take some kind of lock to prevent conflicting with other memory > > maps, do rcu sync under this lock. > > You're right, the problem isn't directly related to RCU. RCU would be > easy to handle by using synchronize_rcu instead of call_rcu. While I > identified an RCU-related problem with Igor's patches, it's much more > entrenched. > > RAM can be used by asynchronous operations while the VM runs, between > address_space_map and address_space_unmap. It is possible and common to > have a quiescent state between the map and unmap, and a memory map > change can happen in the middle of this. Normally this is not a > problem, because changes to the memory map do not make the hva go away > (memory regions are reference counted). Right, so you want mmap(MAP_NORESERVE) when that reference count becomes 0. > However, with Igor's patches a memory_region_del_subregion will cause a > mmap(MAP_NORESERVE), which _does_ have the effect of making the hva go away. > > I guess one way to do it would be to alias the same page in two places, > one for use by vhost and one for use by everything else. However, the > kernel does not provide the means to do this kind of aliasing for > anonymous mmaps. > > Paolo Basically pages go away on munmap, so won't simple lock munmap mmap(MAP_NORESERVE) unlock do the trick? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/13] KVM: arm64: enable ITS emulation as a virtual MSI controller
Hi Eric, On 06/18/2015 09:43 AM, Eric Auger wrote: > On 05/29/2015 11:53 AM, Andre Przywara wrote: >> If userspace has provided a base address for the ITS register frame, >> we enable the bits that advertise LPIs in the GICv3. >> When the guest has enabled LPIs and the ITS, we enable the emulation >> part by initializing the ITS data structures and trapping on ITS >> register frame accesses by the guest. >> Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject >> MSIs into the guest. Not having enabled the ITS emulation will lead >> to a -ENODEV when trying to inject a MSI. >> >> Signed-off-by: Andre Przywara >> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c >> index 9f7b05f..09b1f46 100644 >> --- a/virt/kvm/arm/vgic.c >> +++ b/virt/kvm/arm/vgic.c >> @@ -2254,3 +2254,13 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry >> *e, >> { >> return 0; >> } >> + >> +#ifdef CONFIG_HAVE_KVM_MSI > I don't think the if#def is requested since the entry is already > prevented in kvm_main.c in, case KVM_SIGNAL_MSI. But that fails compilation on ARM (which uses this file as well), because we have a dummy fail function in the header if CONFIG_HAVE_KVM_MSI is not defined. So you get: error: redefinition of 'kvm_send_userspace_msi' Cheers, Andre. >> +int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) >> +{ >> +if (kvm->arch.vgic.vm_ops.inject_msi) >> +return kvm->arch.vgic.vm_ops.inject_msi(kvm, msi); >> +else >> +return -ENODEV; >> +} >> +#endif >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: GICv3 ITS emulation)
Hello! > I also have an implementation of GSI routing on ARM, basically a rebase > of my old/first implementation of irqfd > (https://patches.linaro.org/32261/) based on irqchip gsi routing & qemu > part (https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01090.html). I took a glance at it, and looks like it's already obsolete. We already have a convention of GSI number == SPI number. Kind of hardcoded default routing table which cannot be changed. It is used at least by GICv2m emulation. I think we should maintain backwards compatibility with it. I thought about something like: a) GSI < 8192 - correspond to SPIs and cannot be re-routed. b) GSI >= 8192 - correspond to MSI and need to be routed before use. During routing setup we could use either GSI with offset (starting from 8192), or raw number (starting from 0). In case of raw number we would have some complex structure of GSI field in KVM_CAP_IRQFD ioctl, similar to KVM_IRQ_LINE. Something like: bits: | 31 ... 24 | 23 ... 0 | field: | irq_type | irq_id | irq_type[0]: irq_id = SPI irq_type[3]: irq_id = GSI number routed to MSI Consequently, we have to implement only KVM_IRQ_ROUTING_MSI type and completely ignore KVM_IRQ_ROUTING_IRQCHIP. I hope i am clear enough... Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On 18/06/2015 15:19, Michael S. Tsirkin wrote: > On Thu, Jun 18, 2015 at 01:50:32PM +0200, Paolo Bonzini wrote: >> >> >> On 18/06/2015 13:41, Michael S. Tsirkin wrote: >>> On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: Lets leave decision upto users instead of making them live with crashing guests. >>> >>> Come on, let's fix it in userspace. >> >> It's not trivial to fix it in userspace. Since QEMU uses RCU there >> isn't a single memory map to use for a linear gpa->hva map. > > Could you elaborate? > > I'm confused by this mention of RCU. > You use RCU for accesses to the memory map, correct? > So memory map itself is a write side operation, as such all you need to > do is take some kind of lock to prevent conflicting with other memory > maps, do rcu sync under this lock. You're right, the problem isn't directly related to RCU. RCU would be easy to handle by using synchronize_rcu instead of call_rcu. While I identified an RCU-related problem with Igor's patches, it's much more entrenched. RAM can be used by asynchronous operations while the VM runs, between address_space_map and address_space_unmap. It is possible and common to have a quiescent state between the map and unmap, and a memory map change can happen in the middle of this. Normally this is not a problem, because changes to the memory map do not make the hva go away (memory regions are reference counted). However, with Igor's patches a memory_region_del_subregion will cause a mmap(MAP_NORESERVE), which _does_ have the effect of making the hva go away. I guess one way to do it would be to alias the same page in two places, one for use by vhost and one for use by everything else. However, the kernel does not provide the means to do this kind of aliasing for anonymous mmaps. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, Jun 18, 2015 at 01:50:32PM +0200, Paolo Bonzini wrote: > > > On 18/06/2015 13:41, Michael S. Tsirkin wrote: > > On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: > >> Lets leave decision upto users instead of making them live with > >> crashing guests. > > > > Come on, let's fix it in userspace. > > It's not trivial to fix it in userspace. Since QEMU uses RCU there > isn't a single memory map to use for a linear gpa->hva map. Could you elaborate? I'm confused by this mention of RCU. You use RCU for accesses to the memory map, correct? So memory map itself is a write side operation, as such all you need to do is take some kind of lock to prevent conflicting with other memory maps, do rcu sync under this lock. > I find it absurd that we're fighting over 12K of memory. > > Paolo I wouldn't worry so much if it didn't affect kernel/userspace API. Need to be careful there. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, 18 Jun 2015 13:41:22 +0200 "Michael S. Tsirkin" wrote: > On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: > > Lets leave decision upto users instead of making them live with > > crashing guests. > > Come on, let's fix it in userspace. I'm not abandoning userspace approach either but it might take time to implement in robust manner as it's much more complex and has much more places to backfire then a straightforward kernel fix which will work for both old userspace and a new one. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On 18/06/2015 13:41, Michael S. Tsirkin wrote: > On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: >> Lets leave decision upto users instead of making them live with >> crashing guests. > > Come on, let's fix it in userspace. It's not trivial to fix it in userspace. Since QEMU uses RCU there isn't a single memory map to use for a linear gpa->hva map. I find it absurd that we're fighting over 12K of memory. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote: > Lets leave decision upto users instead of making them live with > crashing guests. Come on, let's fix it in userspace. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, 18 Jun 2015 11:50:22 +0200 "Michael S. Tsirkin" wrote: > On Thu, Jun 18, 2015 at 11:12:24AM +0200, Igor Mammedov wrote: > > On Wed, 17 Jun 2015 18:30:02 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Jun 17, 2015 at 06:09:21PM +0200, Igor Mammedov wrote: > > > > On Wed, 17 Jun 2015 17:38:40 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote: > > > > > > On Wed, 17 Jun 2015 16:32:02 +0200 > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > > > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 17/06/2015 15:13, Michael S. Tsirkin wrote: > > > > > > > > > > > Considering userspace can be malicious, I guess yes. > > > > > > > > > > I don't think it's a valid concern in this case, > > > > > > > > > > setting limit back from 509 to 64 will not help here in any > > > > > > > > > > way, userspace still can create as many vhost instances as > > > > > > > > > > it needs to consume memory it desires. > > > > > > > > > > > > > > > > > > Not really since vhost char device isn't world-accessible. > > > > > > > > > It's typically opened by a priveledged tool, the fd is > > > > > > > > > then passed to an unpriveledged userspace, or permissions > > > > > > > > > dropped. > > > > > > > > > > > > > > > > Then what's the concern anyway? > > > > > > > > > > > > > > > > Paolo > > > > > > > > > > > > > > Each fd now ties up 16K of kernel memory. It didn't use to, so > > > > > > > priveledged tool could safely give the unpriveledged userspace > > > > > > > a ton of these fds. > > > > > > if privileged tool gives out unlimited amount of fds then it > > > > > > doesn't matter whether fd ties 4K or 16K, host still could be DoSed. > > > > > > > > > > > > > > > > Of course it does not give out unlimited fds, there's a way > > > > > for the sysadmin to specify the number of fds. Look at how libvirt > > > > > uses vhost, it should become clear I think. > > > > then it just means that tool has to take into account a new limits > > > > to partition host in sensible manner. > > > > > > Meanwhile old tools are vulnerable to OOM attacks. > > I've chatted with libvirt folks, it doesn't care about how much memory > > vhost would consume nor do any host capacity planning in that regard. > > Exactly, it's up to host admin. > > > But lets assume that there are tools that do this so > > how about instead of hardcoding limit make it a module parameter > > with default set to 64. That would allow users to set higher limit > > if they need it and nor regress old tools. it will also give tools > > interface for reading limit from vhost module. > > And now you need to choose between security and functionality :( There is no conflict here and it's not about choosing. If admin has a method to estimate guest memory footprint to do capacity partitioning then he would need to redo partitioning taking in account new footprint when he/she rises limit manually. (BTW libvirt has tried and reverted patches that were trying to predict required memory, admin might be able to do it manually better but it's another topic how to do it ans it's not related to this thread) Lets leave decision upto users instead of making them live with crashing guests. > > > > > > > > Exposing limit as module parameter might be of help to tool for > > > > getting/setting it in a way it needs. > > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] powerpc: add hvcall.h header from Linux
The powerpc code uses some PAPR hypercalls, of which we need the hypercall number. Copy just the needed macro definitions from the kernel's (private) hvcall.h file and remove the extra tricks formerly used to be able to include this header file directly. Signed-off-by: Andre Przywara --- Hi, this version of the header file just contains the definitions we need, while still being easily diff-able against the original file. Please consider applying this one. Cheers, Andre. powerpc/include/asm/hvcall.h | 33 + powerpc/spapr.h | 3 --- 2 files changed, 33 insertions(+), 3 deletions(-) create mode 100644 powerpc/include/asm/hvcall.h diff --git a/powerpc/include/asm/hvcall.h b/powerpc/include/asm/hvcall.h new file mode 100644 index 000..9d58f9b --- /dev/null +++ b/powerpc/include/asm/hvcall.h @@ -0,0 +1,33 @@ +#ifndef _ASM_POWERPC_HVCALL_H +#define _ASM_POWERPC_HVCALL_H + +/* This file is a trimmed-down version of arch/powerpc/include/asm/hvcall.h. */ + +#define H_SUCCESS 0 + +#define H_HARDWARE -1 /* Hardware error */ +#define H_FUNCTION -2 /* Function not supported */ +#define H_PRIVILEGE-3 /* Caller not privileged */ +#define H_PARAMETER-4 /* Parameter invalid, out-of-range or conflicting */ + +#define H_SET_DABR 0x28 +#define H_LOGICAL_CI_LOAD 0x3c +#define H_LOGICAL_CI_STORE 0x40 +#define H_LOGICAL_CACHE_LOAD 0x44 +#define H_LOGICAL_CACHE_STORE 0x48 +#define H_LOGICAL_ICBI 0x4c +#define H_LOGICAL_DCBF 0x50 + +#define H_GET_TERM_CHAR0x54 +#define H_PUT_TERM_CHAR0x58 + +#define H_EOI 0x64 +#define H_CPPR 0x68 +#define H_IPI 0x6c +#define H_IPOLL0x70 +#define H_XIRR 0x74 + +#define H_SET_MODE 0x31C +#define MAX_HCALL_OPCODE H_SET_MODE + +#endif /* _ASM_POWERPC_HVCALL_H */ diff --git a/powerpc/spapr.h b/powerpc/spapr.h index 0537f88..4c6e349 100644 --- a/powerpc/spapr.h +++ b/powerpc/spapr.h @@ -16,10 +16,7 @@ #include -/* We need some of the H_ hcall defs, but they're __KERNEL__ only. */ -#define __KERNEL__ #include -#undef __KERNEL__ #include "kvm/kvm.h" #include "kvm/kvm-cpu.h" -- 2.3.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On 18/06/2015 11:50, Michael S. Tsirkin wrote: > > But lets assume that there are tools that do this so > > how about instead of hardcoding limit make it a module parameter > > with default set to 64. That would allow users to set higher limit > > if they need it and nor regress old tools. it will also give tools > > interface for reading limit from vhost module. > > And now you need to choose between security and functionality :( Don't call "security" a 16K allocation that can fall back to vmalloc please. That's an insult to actual security problems... Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Thu, Jun 18, 2015 at 11:12:24AM +0200, Igor Mammedov wrote: > On Wed, 17 Jun 2015 18:30:02 +0200 > "Michael S. Tsirkin" wrote: > > > On Wed, Jun 17, 2015 at 06:09:21PM +0200, Igor Mammedov wrote: > > > On Wed, 17 Jun 2015 17:38:40 +0200 > > > "Michael S. Tsirkin" wrote: > > > > > > > On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote: > > > > > On Wed, 17 Jun 2015 16:32:02 +0200 > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > > > > > > > On 17/06/2015 15:13, Michael S. Tsirkin wrote: > > > > > > > > > > Considering userspace can be malicious, I guess yes. > > > > > > > > > I don't think it's a valid concern in this case, > > > > > > > > > setting limit back from 509 to 64 will not help here in any > > > > > > > > > way, userspace still can create as many vhost instances as > > > > > > > > > it needs to consume memory it desires. > > > > > > > > > > > > > > > > Not really since vhost char device isn't world-accessible. > > > > > > > > It's typically opened by a priveledged tool, the fd is > > > > > > > > then passed to an unpriveledged userspace, or permissions > > > > > > > > dropped. > > > > > > > > > > > > > > Then what's the concern anyway? > > > > > > > > > > > > > > Paolo > > > > > > > > > > > > Each fd now ties up 16K of kernel memory. It didn't use to, so > > > > > > priveledged tool could safely give the unpriveledged userspace > > > > > > a ton of these fds. > > > > > if privileged tool gives out unlimited amount of fds then it > > > > > doesn't matter whether fd ties 4K or 16K, host still could be DoSed. > > > > > > > > > > > > > Of course it does not give out unlimited fds, there's a way > > > > for the sysadmin to specify the number of fds. Look at how libvirt > > > > uses vhost, it should become clear I think. > > > then it just means that tool has to take into account a new limits > > > to partition host in sensible manner. > > > > Meanwhile old tools are vulnerable to OOM attacks. > I've chatted with libvirt folks, it doesn't care about how much memory > vhost would consume nor do any host capacity planning in that regard. Exactly, it's up to host admin. > But lets assume that there are tools that do this so > how about instead of hardcoding limit make it a module parameter > with default set to 64. That would allow users to set higher limit > if they need it and nor regress old tools. it will also give tools > interface for reading limit from vhost module. And now you need to choose between security and functionality :( > > > > > Exposing limit as module parameter might be of help to tool for > > > getting/setting it in a way it needs. > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] powerpc: implement barrier primitives
On Thu, Jun 18, 2015 at 10:11:58AM +0100, Michael Ellerman wrote: > On Wed, 2015-06-17 at 11:15 +0100, Will Deacon wrote: > > On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: > > > Instead of referring to the Linux header including the barrier > > > macros, copy over the rather simple implementation for the PowerPC > > > barrier instructions kvmtool uses. This fixes build for powerpc. > > > > > > Signed-off-by: Andre Przywara > > > --- > > > Hi, > > > > > > I just took what kvmtool seems to have used before, I actually have > > > no idea if "sync" is the right instruction or "lwsync" would do. > > > Would be nice if some people with PowerPC knowledge could comment. > > > > I *think* we can use lwsync for rmb and wmb, but would want confirmation > > from a ppc guy before making that change! > > Ugh, memory barriers :) I prefer to call them "Job Security" :) > You probably can use lwsync, assuming you're only ordering cacheable vs > cacheable. > > But, lwsync has given us pain in the past[1], so I'd be happier if you just > used > sync. No probs. I pushed Andre's original patch. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> -Original Message- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Tuesday, June 16, 2015 12:45 AM > To: Eric Auger > Cc: Avi Kivity; Wu, Feng; kvm@vger.kernel.org; linux-ker...@vger.kernel.org; > pbonz...@redhat.com; mtosa...@redhat.com > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote: > > Hi Alex, all, > > On 06/12/2015 09:03 PM, Alex Williamson wrote: > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote: > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote: > > >>> On Fri, 2015-06-12 at 00:23 +, Wu, Feng wrote: > > > -Original Message- > > > From: Avi Kivity [mailto:avi.kiv...@gmail.com] > > > Sent: Friday, June 12, 2015 3:59 AM > > > To: Wu, Feng; kvm@vger.kernel.org; linux-ker...@vger.kernel.org > > > Cc: pbonz...@redhat.com; mtosa...@redhat.com; > > > alex.william...@redhat.com; eric.au...@linaro.org > > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding > > > > > > On 06/11/2015 01:51 PM, Feng Wu wrote: > > >> From: Eric Auger > > >> > > >> This patch adds and documents a new KVM_DEV_VFIO_DEVICE > group > > >> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, > > >> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be > able > > >> to set a VFIO device IRQ as forwarded or not forwarded. > > >> the command takes as argument a handle to a new struct named > > >> kvm_vfio_dev_irq. > > > Is there no way to do this automatically? After all, vfio knows that > > > a > > > device interrupt is forwarded to some eventfd, and kvm knows that > some > > > eventfd is forwarded to a guest interrupt. If they compare notes > > > through a central registry, they can figure out that the interrupt > > > needs > > > to be forwarded. > > Oh, just like Eric mentioned in his reply, this description is out of > > context > of > > this series, I will remove them in the next version. > > >>> > > >>> I suspect Avi's question was more general. While forward/unforward is > > >>> out of context for this series, it's very similar in nature to > > >>> enabling/disabling posted interrupts. So I think the question remains > > >>> whether we really need userspace to participate in creating this > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out > > >>> automatically. > > >>> > > >>> Personally I don't know how we could do it automatically. We've always > > >>> relied on userspace to independently setup vfio and kvm such that > > >>> neither have any idea that the other is there and update each side > > >>> independently when anything changes. So it seems consistent to > continue > > >>> that here. It doesn't seem like there's much to gain performance-wise > > >>> either, updates should be a relatively rare event I'd expect. > > >>> > > >>> There's really no metadata associated with an eventfd, so "comparing > > >>> notes" automatically might imply some central registration entity. That > > >>> immediately sounds like a much more complex solution, but maybe Avi > has > > >>> some ideas to manage it. Thanks, > > >>> > > >> > > >> The idea is to have a central registry maintained by a posted interrupts > > >> manager. Both vfio and kvm pass the filp (along with extra information) > > >> to the posted interrupts manager, which, when it detects a filp match, > > >> tells each of them what to do. > > >> > > >> The advantages are: > > >> - old userspace gains the optimization without change > > >> - a userspace API is more expensive to maintain than internal kernel > > >> interfaces (CVEs, documentation, maintaining backwards compatibility) > > >> - if you can do it without a new interface, this indicates that all the > > >> information in the new interface is redundant. That means you have to > > >> check it for consistency with the existing information, so it's extra > > >> work (likely, it's exactly what the posted interrupt manager would be > > >> doing anyway). > > > > > > Yep, those all sound like good things and I believe that's similar in > > > design to the way we had originally discussed this interaction at > > > LPC/KVM Forum several years ago. I'd be in favor of that approach. > > > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO > > IRQ forward control" series? Or is that "central registry maintained by > > a posted interrupts manager" something more specific to x86? > > I'd think we'd want it for any sort of offload and supporting both > posted-interrupts and irq-forwarding would be a good validation. I > imagine there would be registration/de-registration callbacks separate > for interrupt producers vs interrupt consumers. Each registration > function would likely provide a struct of callbacks, probably similar to > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ > producer sid
Re: [PATCH 3/3] powerpc: add hvcall.h header from Linux
On Wed, 2015-06-17 at 11:13 +0100, Will Deacon wrote: > On Wed, Jun 17, 2015 at 10:43:50AM +0100, Andre Przywara wrote: > > The powerpc code uses some PAPR hypercalls, of which we need the > > hypercall number. Copy the macro definition parts from the kernel's > > (private) hvcall.h file and remove the extra tricks formerly used > > to be able to include this header file directly. > > > > Signed-off-by: Andre Przywara > > --- > > Hi, > > > > I copied most of the Linux header, without removing > > definitions that kvmtool doesn't use. That should make updates > > easier. If people would prefer a bespoke header, let me know. > > I'd rather just #define the stuff we need now that we're outside of the > kernel source tree. Yeah that's probably cleaner. I think you only need: H_CPPR H_EOI H_FUNCTION H_GET_TERM_CHAR H_HARDWARE H_IPI H_LOGICAL_CACHE_LOAD H_LOGICAL_CACHE_STORE H_LOGICAL_CI_LOAD H_LOGICAL_CI_STORE H_LOGICAL_DCBF H_LOGICAL_ICBI H_PARAMETER H_PUT_TERM_CHAR H_SET_DABR H_SUCCESS H_XIRR KVMPPC_H_RTAS cheers -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] vhost: support upto 509 memory regions
On Wed, 17 Jun 2015 18:30:02 +0200 "Michael S. Tsirkin" wrote: > On Wed, Jun 17, 2015 at 06:09:21PM +0200, Igor Mammedov wrote: > > On Wed, 17 Jun 2015 17:38:40 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote: > > > > On Wed, 17 Jun 2015 16:32:02 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > > > > On 17/06/2015 15:13, Michael S. Tsirkin wrote: > > > > > > > > > Considering userspace can be malicious, I guess yes. > > > > > > > > I don't think it's a valid concern in this case, > > > > > > > > setting limit back from 509 to 64 will not help here in any > > > > > > > > way, userspace still can create as many vhost instances as > > > > > > > > it needs to consume memory it desires. > > > > > > > > > > > > > > Not really since vhost char device isn't world-accessible. > > > > > > > It's typically opened by a priveledged tool, the fd is > > > > > > > then passed to an unpriveledged userspace, or permissions > > > > > > > dropped. > > > > > > > > > > > > Then what's the concern anyway? > > > > > > > > > > > > Paolo > > > > > > > > > > Each fd now ties up 16K of kernel memory. It didn't use to, so > > > > > priveledged tool could safely give the unpriveledged userspace > > > > > a ton of these fds. > > > > if privileged tool gives out unlimited amount of fds then it > > > > doesn't matter whether fd ties 4K or 16K, host still could be DoSed. > > > > > > > > > > Of course it does not give out unlimited fds, there's a way > > > for the sysadmin to specify the number of fds. Look at how libvirt > > > uses vhost, it should become clear I think. > > then it just means that tool has to take into account a new limits > > to partition host in sensible manner. > > Meanwhile old tools are vulnerable to OOM attacks. I've chatted with libvirt folks, it doesn't care about how much memory vhost would consume nor do any host capacity planning in that regard. But lets assume that there are tools that do this so how about instead of hardcoding limit make it a module parameter with default set to 64. That would allow users to set higher limit if they need it and nor regress old tools. it will also give tools interface for reading limit from vhost module. > > > Exposing limit as module parameter might be of help to tool for > > getting/setting it in a way it needs. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] powerpc: implement barrier primitives
On Wed, 2015-06-17 at 11:15 +0100, Will Deacon wrote: > On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: > > Instead of referring to the Linux header including the barrier > > macros, copy over the rather simple implementation for the PowerPC > > barrier instructions kvmtool uses. This fixes build for powerpc. > > > > Signed-off-by: Andre Przywara > > --- > > Hi, > > > > I just took what kvmtool seems to have used before, I actually have > > no idea if "sync" is the right instruction or "lwsync" would do. > > Would be nice if some people with PowerPC knowledge could comment. > > I *think* we can use lwsync for rmb and wmb, but would want confirmation > from a ppc guy before making that change! Ugh, memory barriers :) You probably can use lwsync, assuming you're only ordering cacheable vs cacheable. But, lwsync has given us pain in the past[1], so I'd be happier if you just used sync. cheers [1]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=51d7d5205d3389a32859f9939f1093f267409929 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/13] KVM: arm64: enable ITS emulation as a virtual MSI controller
On 05/29/2015 11:53 AM, Andre Przywara wrote: > If userspace has provided a base address for the ITS register frame, > we enable the bits that advertise LPIs in the GICv3. > When the guest has enabled LPIs and the ITS, we enable the emulation > part by initializing the ITS data structures and trapping on ITS > register frame accesses by the guest. > Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject > MSIs into the guest. Not having enabled the ITS emulation will lead > to a -ENODEV when trying to inject a MSI. > > Signed-off-by: Andre Przywara > --- > Documentation/virtual/kvm/api.txt | 2 +- > arch/arm64/kvm/Kconfig| 1 + > include/kvm/arm_vgic.h| 10 ++ > virt/kvm/arm/its-emul.c | 9 - > virt/kvm/arm/vgic-v3-emul.c | 20 +++- > virt/kvm/arm/vgic.c | 10 ++ > 6 files changed, 45 insertions(+), 7 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 891d64a..d20fd94 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2108,7 +2108,7 @@ after pausing the vcpu, but before it is resumed. > 4.71 KVM_SIGNAL_MSI > > Capability: KVM_CAP_SIGNAL_MSI > -Architectures: x86 > +Architectures: x86 arm64 > Type: vm ioctl > Parameters: struct kvm_msi (in) > Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig > index 5105e29..6c432c0 100644 > --- a/arch/arm64/kvm/Kconfig > +++ b/arch/arm64/kvm/Kconfig > @@ -30,6 +30,7 @@ config KVM > select SRCU > select HAVE_KVM_EVENTFD > select HAVE_KVM_IRQFD > + select HAVE_KVM_MSI > ---help--- > Support hosting virtualized guest machines. > > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h > index 6bb138d..8f1be6a 100644 > --- a/include/kvm/arm_vgic.h > +++ b/include/kvm/arm_vgic.h > @@ -162,6 +162,7 @@ struct vgic_io_device { > > struct vgic_its { > boolenabled; > + struct vgic_io_device iodev; > spinlock_t lock; > u64 cbaser; > int creadr; > @@ -365,4 +366,13 @@ static inline int vgic_v3_probe(struct device_node > *vgic_node, > } > #endif > > +#ifdef CONFIG_HAVE_KVM_MSI > +int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi); > +#else > +static inline int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi > *msi) > +{ > + return -ENODEV; > +} > +#endif > + > #endif > diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c > index 35e886c..864de19 100644 > --- a/virt/kvm/arm/its-emul.c > +++ b/virt/kvm/arm/its-emul.c > @@ -964,6 +964,7 @@ int vits_init(struct kvm *kvm) > { > struct vgic_dist *dist = &kvm->arch.vgic; > struct vgic_its *its = &dist->its; > + int ret; > > if (IS_VGIC_ADDR_UNDEF(dist->vgic_its_base)) > return -ENXIO; > @@ -977,9 +978,15 @@ int vits_init(struct kvm *kvm) > INIT_LIST_HEAD(&its->device_list); > INIT_LIST_HEAD(&its->collection_list); > > + ret = vgic_register_kvm_io_dev(kvm, dist->vgic_its_base, > +KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges, > +-1, &its->iodev); > + if (ret) > + return ret; > + > its->enabled = false; > > - return -ENXIO; > + return 0; > } > > void vits_destroy(struct kvm *kvm) > diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c > index 4513551..71d0bcf 100644 > --- a/virt/kvm/arm/vgic-v3-emul.c > +++ b/virt/kvm/arm/vgic-v3-emul.c > @@ -89,10 +89,11 @@ static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu, > /* > * As this implementation does not provide compatibility > * with GICv2 (ARE==1), we report zero CPUs in bits [5..7]. > - * Also LPIs and MBIs are not supported, so we set the respective bits to 0. > - * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs). > + * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs) > + * and provide 16 bits worth of LPI number space (to give 8192 LPIs). > */ > -#define INTERRUPT_ID_BITS 10 > +#define INTERRUPT_ID_BITS_SPIS 10 > +#define INTERRUPT_ID_BITS_ITS 16 > static bool handle_mmio_typer(struct kvm_vcpu *vcpu, > struct kvm_exit_mmio *mmio, phys_addr_t offset) > { > @@ -100,7 +101,12 @@ static bool handle_mmio_typer(struct kvm_vcpu *vcpu, > > reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1; > > - reg |= (INTERRUPT_ID_BITS - 1) << 19; > + if (vgic_has_its(vcpu->kvm)) { > + reg |= GICD_TYPER_LPIS; > + reg |= (INTERRUPT_ID_BITS_ITS - 1) << 19; > + } else { > + reg |= (INTERRUPT_ID_BITS_SPIS - 1) << 19; > + } > > vgic_reg_access(mmio, ®, offset, > ACCESS_READ_VALUE |
Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts
On 17/06/15 16:50, Eric Auger wrote: > On 06/17/2015 05:37 PM, Marc Zyngier wrote: >> On 17/06/15 16:11, Eric Auger wrote: >>> Hi Marc, >>> On 06/08/2015 07:04 PM, Marc Zyngier wrote: So far, the only use of the HW interrupt facility is the timer, implying that the active state is context-switched for each vcpu, as the device is is shared across all vcpus. >>> s/is// This does not work for a device that has been assigned to a VM, as the guest is entierely in control of that device (the HW is >>> entirely? not shared). In that case, it makes sense to bypass the whole active state srtwitchint, and only track the deactivation of the >>> switching >> >> Congratulations, I think you're now ready to try deciphering my >> handwriting... ;-) > good to see you're not a machine or maybe you do it on purpose some > times ;-) >> interrupt. Signed-off-by: Marc Zyngier --- include/kvm/arm_vgic.h| 5 +++-- virt/kvm/arm/arch_timer.c | 2 +- virt/kvm/arm/vgic.c | 37 - 3 files changed, 28 insertions(+), 16 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 1c653c1..5d47d60 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -164,7 +164,8 @@ struct irq_phys_map { u32 virt_irq; u32 phys_irq; u32 irq; - boolactive; + boolshared; + boolactive; /* Only valid if shared */ }; struct vgic_dist { @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu); struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu, - int virt_irq, int irq); + int virt_irq, int irq, bool shared); int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map); bool vgic_get_phys_irq_active(struct irq_phys_map *map); void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active); diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index b9fff78..9544d79 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu, * Tell the VGIC that the virtual interrupt is tied to a * physical interrupt. We do that once per VCPU. */ - timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq); + timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true); WARN_ON(!timer->map); } diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index f376b56..4223166 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, map = vgic_irq_map_search(vcpu, irq); if (map) { - int ret; - - BUG_ON(!map->active); vlr.hwirq = map->phys_irq; vlr.state |= LR_HW; vlr.state &= ~LR_EOI_INT; - ret = irq_set_irqchip_state(map->irq, - IRQCHIP_STATE_ACTIVE, - true); vgic_irq_set_queued(vcpu, irq); >>> >>> the queued state is set again in vgic_queue_hwirq for level_sensitive >>> IRQs although not harmful. >> >> Indeed. We still need it for edge interrupts though. I'll try to find a >> nicer way... >> - WARN_ON(ret); + + if (map->shared) { + int ret; + + BUG_ON(!map->active); + ret = irq_set_irqchip_state(map->irq, + IRQCHIP_STATE_ACTIVE, + true); + WARN_ON(ret); + } } } @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu) static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr) { struct irq_phys_map *map; + bool active; int ret; if (!(vlr.state & LR_HW)) return 0; map = vgic_irq_map_search(vcpu, vlr.irq); - BUG_ON(!map || !map->active); + BUG_ON(!map); + BUG_ON(map->shared && !map->active); ret = irq_get_irqchip_state(map->irq,
KVM slow LAMP guest
Hi, I have a LAMP server as guest in KVM. Whenever the server is idle for some time it takes about 30 seconds to load a Wordpress site. If the server is not idle the site shows up in max 5 seconds. I've already turned of power management in the guest by passing GRUB_CMDLINE_LINUX_DEFAULT="apm=off" in /etc/default/grub. This has no effect. Does KVM do some power management on guests? If so, how do I turn this off for my LAMP guest? Best, Hansa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/13] KVM: x86: add SMM to the MMU role, support SMRAM address space
On 18/06/2015 07:02, Xiao Guangrong wrote: > However, role->level is more hotter than role->smm so that it's also a good > candidate for this kind of trick. Right, we could give the first 8 bits to role->level, so it can be accessed with a single memory load and extracted with a single AND. Those two are definitely the hottest fields. > And this is only 32 bits which can be operated in a CPU register by a > single memory load, that is why i was worried if it is really needed. However, an 8-bit field can be loaded from memory with a single movz instruction. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html