Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values
Alexander Graf writes: > On 11.11.2013, at 15:02, Aneesh Kumar K.V > wrote: > >> From: "Aneesh Kumar K.V" >> >> Don't try to compute these values. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> >> NOTE: I am not sure why we were originally computing dsisr and dar. So may be >> we need a variant of this patch. But with this and the additional patch >> "powerpc: book3s: PR: Enable Little Endian PR guest" I am able to get a >> Little Endian >> PR guest to boot. > > It's quite easy to find out - git blame tells you all the history and points > you to commit ca7f4203b. > > commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a > Author: Alexander Graf > Date: Wed Mar 24 21:48:28 2010 +0100 > > KVM: PPC: Implement alignment interrupt > > Mac OS X has some applications - namely the Finder - that require > alignment > interrupts to work properly. So we need to implement them. > > But the spec for 970 and 750 also looks different. While 750 requires the > DSISR and DAR fields to reflect some instruction bits (DSISR) and the > fault > address (DAR), the 970 declares this as an optional feature. So we need > to reconstruct DSISR and DAR manually. > > Signed-off-by: Alexander Graf > Signed-off-by: Avi Kivity > > Read this as "on 970, alignment interrupts don't give us DSISR and DAR of the > faulting instruction" as otherwise I wouldn't have implemented it. > > So this is clearly a nack on this patch :). I can possibly do a if (cpu_has_feature(CPU_FTR_ARCH_201)). But do we need to do that ? According to Paul we should always find DAR. -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Updated Elvis Upstreaming Roadmap
On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin wrote: > On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: >> Hi, >> >> Thank you all for your comments. >> I'm sorry for taking this long to reply, I was away on vacation.. >> >> It was a good, long discussion, many issues were raised, which we'd like >> to address with the following proposed roadmap for Elvis patches. >> In general, we believe it would be best to start with patches that are >> as simple as possible, providing the basic Elvis functionality, >> and attend to the more complicated issues in subsequent patches. >> >> Here's the road map for Elvis patches: > > Thanks for the follow up. Some suggestions below. > Please note they suggestions below merely represent > thoughts on merging upstream. > If as the first step you are content with keeping this > work as out of tree patches, in order to have > the freedom to experiment with interfaces and > performance, please feel free to ignore them. > >> 1. Shared vhost thread for multiple devices. >> >> The way to go here, we believe, is to start with a patch having a shared >> vhost thread for multiple devices of the SAME vm. >> The next step/patch may be handling vms belonging to the same cgroup. >> >> Finally, we need to extend the functionality so that the shared vhost >> thread >> serves multiple vms (not necessarily belonging to the same cgroup). >> >> There was a lot of discussion about the way to address the enforcement >> of cgroup policies, and we will consider the various solutions with a >> future >> patch. > > With respect to the upstream kernel, > I'm not sure a bunch of changes just for the sake of guests with > multiple virtual NIC cards makes sense. > And I wonder how this step, in isolation, will affect e.g. > multiqueue workloads. > But I guess if the numbers are convincing, this can be mergeable. Even if you have a single multiqueue device this change allows to create one vhost thread for all the queues, one vhost thread per queue or any other combination. I guess that depending on the workload and depending on the system utilization (free cycles/cores, density) you would prefer to use one or more vhost threads. > >> >> 2. Creation of vhost threads >> >> We suggested two ways of controlling the creation and removal of vhost >> threads: >> - statically determining the maximum number of virtio devices per worker >> via a kernel module parameter >> - dynamically: Sysfs mechanism to add and remove vhost threads >> >> It seems that it would be simplest to take the static approach as >> a first stage. At a second stage (next patch), we'll advance to >> dynamically >> changing the number of vhost threads, using the static module parameter >> only as a default value. > > I'm not sure how independent this is from 1. > With respect to the upstream kernel, > Introducing interfaces (which we'll have to maintain > forever) just for the sake of guests with > multiple virtual NIC cards does not look like a good tradeoff. > > So I'm unlikely to merge this upstream without making it useful cross-VM, > and yes this means isolation and accounting with cgroups need to > work properly. Agree, but even if you use a single multiqueue device having the ability to use 1 thread to serve all the queues or multiple threads to serve all the queues looks like a useful feature. > >> Regarding cwmq, it is an interesting mechanism, which we need to explore >> further. >> At the moment we prefer not to change the vhost model to use cwmq, as some >> of the issues that were discussed, such as cgroups, are not supported by >> cwmq, and this is adding more complexity. >> However, we'll look further into it, and consider it at a later stage. > > Hmm that's still assuming some smart management tool configuring > this correctly. Can't this be determined automatically depending > on the workload? > This is what the cwmq suggestion was really about: detect > that we need more threads and spawn them. > It's less about sharing the implementation with workqueues - > would be very nice but not a must. But how cwmq can consider cgroup accounting ? In any case, IMHO, the kernel should first provide the "mechanism" so later on a user-space management application (the "policy") can orchestrate it. > > > >> 3. Adding polling mode to vhost >> >> It is a good idea making polling adaptive based on various factors such as >> the I/O rate, the guest kick overhead(which is the tradeoff of polling), >> or the amount of wasted cycles (cycles we kept polling but no new work was >> added). >> However, as a beginning polling patch, we would prefer having a naive >> polling approach, which could be tuned with later patches. >> > > While any polling approach would still need a lot of testing to prove we > don't for example steal CPU from guest which could be doing other useful > work, given that an exit is at least 1.5K cycles at least in theory it > seems like something that can improve performance. I'm not sure how > na
Re: [REPOST][PATCH 1/2] driver core: Add new device_driver flag to allow binding via sysfs only
On Tue, Dec 03, 2013 at 04:34:33PM +0100, Jan Kiszka wrote: > On 2013-12-03 13:34, Kim Phillips wrote: > > VFIO supports pass-through of devices to user space - for sake > > of illustration, say a PCI e1000 device: > > > > - the e1000 is first unbound from the PCI e1000 driver via sysfs > > - the vfio-pci driver is told via new_id that it now handles e1000 devices > > - the e1000 is explicitly bound to vfio-pci through sysfs > > > > However, now we have two drivers in the system that both handle e1000 > > devices. A hotplug event could then occur and it is ambiguous as to which > > driver will claim the device. The desired semantics is that vfio-pci is > > only bound to devices by explicit request in sysfs. This patch makes this > > possible by introducing a sysfs_bind_only flag in struct device_driver. > > > > Signed-off-by: Stuart Yoder > > Signed-off-by: Kim Phillips > > --- > > rebased onto 3.13-rc2, and reposted from first submission which > > recieved no comments: > > > > https://lkml.org/lkml/2013/10/11/53 > > > > drivers/base/dd.c | 5 - > > include/linux/device.h | 2 ++ > > 2 files changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > index 0605176..b83b16d 100644 > > --- a/drivers/base/dd.c > > +++ b/drivers/base/dd.c > > @@ -389,7 +389,7 @@ static int __device_attach(struct device_driver *drv, > > void *data) > > { > > struct device *dev = data; > > > > - if (!driver_match_device(drv, dev)) > > + if (drv->sysfs_bind_only || !driver_match_device(drv, dev)) > > return 0; > > > > return driver_probe_device(drv, dev); > > @@ -476,6 +476,9 @@ static int __driver_attach(struct device *dev, void > > *data) > > */ > > int driver_attach(struct device_driver *drv) > > { > > + if (drv->sysfs_bind_only) > > + return 0; > > + > > return bus_for_each_dev(drv->bus, NULL, drv, __driver_attach); > > } > > EXPORT_SYMBOL_GPL(driver_attach); > > diff --git a/include/linux/device.h b/include/linux/device.h > > index 952b010..ed441d1 100644 > > --- a/include/linux/device.h > > +++ b/include/linux/device.h > > @@ -200,6 +200,7 @@ extern struct klist *bus_get_device_klist(struct > > bus_type *bus); > > * @owner: The module owner. > > * @mod_name: Used for built-in modules. > > * @suppress_bind_attrs: Disables bind/unbind via sysfs. > > + * @sysfs_bind_only: Only allow bind/unbind via sysfs. > > * @of_match_table: The open firmware table. > > * @acpi_match_table: The ACPI match table. > > * @probe: Called to query the existence of a specific device, > > @@ -233,6 +234,7 @@ struct device_driver { > > const char *mod_name; /* used for built-in modules */ > > > > bool suppress_bind_attrs; /* disables bind/unbind via sysfs */ > > + bool sysfs_bind_only; /* only allow bind/unbind via sysfs */ > > > > const struct of_device_id *of_match_table; > > const struct acpi_device_id *acpi_match_table; > > > > I think I only discussed this with Stuart in person at the KVM Forum: > Why not deriving the property "sysfs bind only" from the fact that a > device does wild-card binding? Are there use cases that benefit from > decoupling both features? The driver core does not know if a bus, or a device, handles "wild card" binding, so you can't do it at this level, sorry. greg k-h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST][PATCH 1/2] driver core: Add new device_driver flag to allow binding via sysfs only
On Tue, Dec 03, 2013 at 12:34:46PM +, Kim Phillips wrote: > VFIO supports pass-through of devices to user space - for sake > of illustration, say a PCI e1000 device: > > - the e1000 is first unbound from the PCI e1000 driver via sysfs > - the vfio-pci driver is told via new_id that it now handles e1000 devices > - the e1000 is explicitly bound to vfio-pci through sysfs > > However, now we have two drivers in the system that both handle e1000 > devices. A hotplug event could then occur and it is ambiguous as to which > driver will claim the device. The desired semantics is that vfio-pci is > only bound to devices by explicit request in sysfs. This patch makes this > possible by introducing a sysfs_bind_only flag in struct device_driver. Why deal with this at all and not just deal with the "bind" sysfs file instead? That way no driver core logic needs to be changed at all, and your userspace tools know _exactly_ which device is being bound to the new device. Don't mess with the "new_id" file for stuff like this, as you point out, it's "tricky"... thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST][PATCH 2/2] driver core: platform: allow platform drivers to bind to any device
On Tue, Dec 03, 2013 at 12:34:54PM +, Kim Phillips wrote: > Platform drivers such as the vfio-platform "meta-" driver [1] > should be allowed to specify that they can bind to any device, > much like PCI drivers can with PCI_ANY_ID. > > Currently, binding platform drivers to devices depends on: > > - a string match in the device node's compatible entry (OF) > - a string match in the ACPI id list (ACPI) > - a string match in the id_table (platform data) > - a string match on the driver name (fall-back) > > none of which allow for the notion of "match any." > > This patch adds the notion by adding a "match any device" boolean to > struct platform_driver, for drivers to be able to set and thus not cause > platform_match() to fail when a bind is requested. > > [1] http://www.spinics.net/lists/kvm/msg96701.html > > Signed-off-by: Kim Phillips > --- > rebased onto 3.13-rc2, and reposted from first submission which > received no comments: > > https://lkml.org/lkml/2013/10/11/48 > > drivers/base/platform.c | 4 > include/linux/platform_device.h | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index 3a94b79..78a5b62 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -736,6 +736,10 @@ static int platform_match(struct device *dev, struct > device_driver *drv) > struct platform_device *pdev = to_platform_device(dev); > struct platform_driver *pdrv = to_platform_driver(drv); > > + /* the driver matches any device */ > + if (pdrv->match_any_dev) > + return 1; This breaks userspace in that it will never know to load the module that can "bind to anything". You need a way to encode this in the platform device id that can be a wildcard type of thing, so that userspace can know about this. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values
On Wed, Dec 18, 2013 at 10:44:08PM +0100, Alexander Graf wrote: > > On 11.11.2013, at 15:02, Aneesh Kumar K.V > wrote: > > > From: "Aneesh Kumar K.V" > > > > Don't try to compute these values. > > > > Signed-off-by: Aneesh Kumar K.V > > --- > > > > NOTE: I am not sure why we were originally computing dsisr and dar. So may > > be > > we need a variant of this patch. But with this and the additional patch > > "powerpc: book3s: PR: Enable Little Endian PR guest" I am able to get a > > Little Endian > > PR guest to boot. > > It's quite easy to find out - git blame tells you all the history and points > you to commit ca7f4203b. > > commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a > Author: Alexander Graf > Date: Wed Mar 24 21:48:28 2010 +0100 > > KVM: PPC: Implement alignment interrupt > > Mac OS X has some applications - namely the Finder - that require > alignment > interrupts to work properly. So we need to implement them. > > But the spec for 970 and 750 also looks different. While 750 requires the > DSISR and DAR fields to reflect some instruction bits (DSISR) and the > fault > address (DAR), the 970 declares this as an optional feature. So we need > to reconstruct DSISR and DAR manually. > > Signed-off-by: Alexander Graf > Signed-off-by: Avi Kivity > > Read this as "on 970, alignment interrupts don't give us DSISR and DAR of the > faulting instruction" as otherwise I wouldn't have implemented it. Although it's optional, all IBM POWER cpus, and as far as I know all PowerPC cpus, set DAR on an alignment interrupt to the effective address being accessed. You have a valid point regarding DSISR, but it would be nice to skip the computations where either the host CPU provides the bits, or the virtual CPU doesn't. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
On 29.11.2013, at 03:01, “tiejun.chen” wrote: > Alex, > > I suppose Scott already elaborate anything you want to know. Thanks, applied with this minor patch on top: diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 191c32b..be3de1d 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -470,7 +470,7 @@ _GLOBAL(kvmppc_resume_host) #ifdef CONFIG_64BIT /* * We enter with interrupts disabled in hardware, but -* we need to call RECONCILE_IRQ_STATE anyway to ensure +* we need to call RECONCILE_IRQ_STATE to ensure * that the software state is kept in sync. */ RECONCILE_IRQ_STATE(r3,r5) I've also taken the liberty of making the patch description be more verbose: KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state Rather than calling hard_irq_disable() when we're back in C code we can just call RECONCILE_IRQ_STATE to soft disable IRQs while we're already in hard disabled state. This should be functionally equivalent to the code before, but cleaner and faster. Signed-off-by: Tiejun Chen [agraf: fix comment, commit message] Signed-off-by: Alexander Graf Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values
On 11.11.2013, at 15:02, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" > > Don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > > NOTE: I am not sure why we were originally computing dsisr and dar. So may be > we need a variant of this patch. But with this and the additional patch > "powerpc: book3s: PR: Enable Little Endian PR guest" I am able to get a > Little Endian > PR guest to boot. It's quite easy to find out - git blame tells you all the history and points you to commit ca7f4203b. commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a Author: Alexander Graf Date: Wed Mar 24 21:48:28 2010 +0100 KVM: PPC: Implement alignment interrupt Mac OS X has some applications - namely the Finder - that require alignment interrupts to work properly. So we need to implement them. But the spec for 970 and 750 also looks different. While 750 requires the DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault address (DAR), the 970 declares this as an optional feature. So we need to reconstruct DSISR and DAR manually. Signed-off-by: Alexander Graf Signed-off-by: Avi Kivity Read this as "on 970, alignment interrupts don't give us DSISR and DAR of the faulting instruction" as otherwise I wouldn't have implemented it. So this is clearly a nack on this patch :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mempath prefault: fix off-by-one error
Fix off-by-one error (noticed by Andrea Arcangeli). Signed-off-by: Marcelo Tosatti diff --git a/exec.c b/exec.c index f4b9ef2..1be5252 100644 --- a/exec.c +++ b/exec.c @@ -1000,7 +1000,7 @@ static void *file_ram_alloc(RAMBlock *block, } /* MAP_POPULATE silently ignores failures */ -for (i = 0; i < (memory/hpagesize)-1; i++) { +for (i = 0; i < (memory/hpagesize); i++) { memset(area + (hpagesize*i), 0, 1); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Do not skip the instruction if handle_dr injects a fault
If kvm_get_dr or kvm_set_dr reports that it raised a fault, we must not advance the instruction pointer. Otherwise the exception will hit the wrong instruction. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 56c654c..ee3bf54 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5141,10 +5141,14 @@ static int handle_dr(struct kvm_vcpu *vcpu) reg = DEBUG_REG_ACCESS_REG(exit_qualification); if (exit_qualification & TYPE_MOV_FROM_DR) { unsigned long val; - if (!kvm_get_dr(vcpu, dr, &val)) - kvm_register_write(vcpu, reg, val); + + if (kvm_get_dr(vcpu, dr, &val)) + return 1; + kvm_register_write(vcpu, reg, val); } else - kvm_set_dr(vcpu, dr, vcpu->arch.regs[reg]); + if (kvm_set_dr(vcpu, dr, vcpu->arch.regs[reg])) + return 1; + skip_emulated_instruction(vcpu); return 1; } -- 1.8.1.1.298.ge7eed54 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH backport hints for <3.10] KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)
On Mon, Dec 16, 2013 at 12:38:17PM +0100, Paolo Bonzini wrote: > The KVM patch "fix vapic memory corruption" applies to most kernels that > have KVM, but the fix does not apply on many older branches. The APIs it > uses are available in 3.1, but until 3.9 kvm_gfn_to_hva_cache_init had > one fewer parameter. > > The comments in this patch should help fixing kvm_lapic_set_vapic_addr > in older kernels. I will review the backports as they are posted to > LKML (if I am CCed...). This fails to build on 3.4-stable, could you provide a working backport for that kernel? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvmtool/arm: Add option to override Generic Timer frequency
On 18/12/13 14:07, Alexander Graf wrote: [...] How does it encourage a vendor to properly implement their firmware if there's a workaround? Alex Hi Alex, In short, by enabling the users to create the demand. Yes, like any workaround there's potential for abuse, but having *something* that makes it work is the difference between "I want virtualisation"[1] and "Dear vendor, I've tried virtualisation on your chip/board and it's great, but it tells me I need new firmware, where do I get that?" Having the specs tell them what to do clearly isn't sufficient, so let's give the integrators and consumers incentive to shout at them too. The sooner proper support is commonplace and we can deprecate clock-frequency hacks altogether, the better. Oh, I'm all for hacks. But please don't fall under the illusion that this will push vendors to fix their firmware. It will have the opposite effect - vendors will just point to the workaround and say "but it works" :). If vendors already aren't bothering to support functionality available in their flagship hardware, workarounds hardly make that worse, and are a win for the user. If it can drive adoption enough to get vendors to see the value in at least fixing future products, that's only good. Either way, this hack is basically required because you can't program CNTFRQ because it's controlled by secure firmware, right? So the host os already needs to know about this and probably does have a "clock-frequency" value in its device tree entry already to know how fast its clock ticks. In some cases, yes. In others they don't explicitly use the arch timer at all thus have no frequency set anywhere. In the case of the board I have on my desk, it took hacking the non-secure part of the bootloader, writing a shim to throw away the securely-booted non-hyp cpu0 and fire up a secondary, and hacking a timer node into the host DT to even get as far as having an issue with kvmtool. Couldn't we search for that host entry and automatically pass it into the guest if it's there? That way the whole thing becomes seamless and even less of an issue. In theory that would be an ideal solution, yes. In practice it means either making KVM dependent on PROC_DEVICETREE (yuck) or cooking up some kernel interface to expose the system timer frequency to userspace (double yuck). Not just "global solution to local problem", but "global solution to local problem-that-shouldn't-even-exist-and-you-want-to-go-away-as-soon-as-possible-without-leaving-a-legacy". Besides, that would probably just reinforce the equally wrong behaviour of putting the frequency in the host DT instead of fixing the firmware ;) Robin. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 02/12] KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
From: Paul Mackerras Some users have reported instances of the host hanging with secondary threads of a core waiting for the primary thread to exit the guest, and the primary thread stuck in nap mode. This prompted a review of the memory barriers in the guest entry/exit code, and this is the result. Most of these changes are the suggestions of Dean Burdick . The barriers between updating napping_threads and reading the entry_exit_count on the one hand, and updating entry_exit_count and reading napping_threads on the other, need to be isync not lwsync, since we need to ensure that either the napping_threads update or the entry_exit_count update get seen. It is not sufficient to order the load vs. lwarx, as lwsync does; we need to order the load vs. the stwcx., so we need isync. In addition, we need a full sync before sending IPIs to wake other threads from nap, to ensure that the write to the entry_exit_count is visible before the IPI occurs. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bc8de75..bde28da 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -153,7 +153,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) 13:b machine_check_fwnmi - /* * We come in here when wakened from nap mode on a secondary hw thread. * Relocation is off and most register values are lost. @@ -224,6 +223,11 @@ kvm_start_guest: /* Clear our vcpu pointer so we don't come back in early */ li r0, 0 std r0, HSTATE_KVM_VCPU(r13) + /* +* Make sure we clear HSTATE_KVM_VCPU(r13) before incrementing +* the nap_count, because once the increment to nap_count is +* visible we could be given another vcpu. +*/ lwsync /* Clear any pending IPI - we're an offline thread */ ld r5, HSTATE_XICS_PHYS(r13) @@ -241,7 +245,6 @@ kvm_start_guest: /* increment the nap count and then go to nap mode */ ld r4, HSTATE_KVM_VCORE(r13) addir4, r4, VCORE_NAP_COUNT - lwsync /* make previous updates visible */ 51:lwarx r3, 0, r4 addir3, r3, 1 stwcx. r3, 0, r4 @@ -990,14 +993,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) */ /* Increment the threads-exiting-guest count in the 0xff00 bits of vcore->entry_exit_count */ - lwsync ld r5,HSTATE_KVM_VCORE(r13) addir6,r5,VCORE_ENTRY_EXIT 41:lwarx r3,0,r6 addir0,r3,0x100 stwcx. r0,0,r6 bne 41b - lwsync + isync /* order stwcx. vs. reading napping_threads */ /* * At this point we have an interrupt that we have to pass @@ -1030,6 +1032,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) sld r0,r0,r4 andc. r3,r3,r0/* no sense IPI'ing ourselves */ beq 43f + /* Order entry/exit update vs. IPIs */ + sync mulli r4,r4,PACA_SIZE /* get paca for thread 0 */ subfr6,r4,r13 42:andi. r0,r3,1 @@ -1638,10 +1642,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206) bge kvm_cede_exit stwcx. r4,0,r6 bne 31b + /* order napping_threads update vs testing entry_exit_count */ + isync li r0,1 stb r0,HSTATE_NAPPING(r13) - /* order napping_threads update vs testing entry_exit_count */ - lwsync mr r4,r3 lwz r7,VCORE_ENTRY_EXIT(r5) cmpwi r7,0x100 -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 03/12] KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
From: Paul Mackerras Lockdep reported that there is a potential for deadlock because vcpu->arch.tbacct_lock is not irq-safe, and is sometimes taken inside the rq_lock (run-queue lock) in the scheduler, which is taken within interrupts. The lockdep splat looks like: == [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 3.12.0-rc5-kvm+ #8 Not tainted -- qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}, at: [] .kvmppc_core_vcpu_put_hv+0x2c/0xa0 and this task is already holding: (&rq->lock){-.-.-.}, at: [] .__schedule+0x180/0xaa0 which would create a new lock dependency: (&rq->lock){-.-.-.} -> (&(&vcpu->arch.tbacct_lock)->rlock){+.+...} but this new dependency connects a HARDIRQ-irq-safe lock: (&rq->lock){-.-.-.} ... which became HARDIRQ-irq-safe at: [] .lock_acquire+0xbc/0x190 [] ._raw_spin_lock+0x34/0x60 [] .scheduler_tick+0x54/0x180 [] .update_process_times+0x70/0xa0 [] .tick_periodic+0x3c/0xe0 [] .tick_handle_periodic+0x28/0xb0 [] .timer_interrupt+0x120/0x2e0 [] decrementer_common+0x168/0x180 [] .get_page_from_freelist+0x924/0xc10 [] .__alloc_pages_nodemask+0x200/0xba0 [] .alloc_pages_exact_nid+0x68/0x110 [] .page_cgroup_init+0x1e0/0x270 [] .start_kernel+0x3e0/0x4e4 [] .start_here_common+0x20/0x70 to a HARDIRQ-irq-unsafe lock: (&(&vcpu->arch.tbacct_lock)->rlock){+.+...} ... which became HARDIRQ-irq-unsafe at: ... [] .lock_acquire+0xbc/0x190 [] ._raw_spin_lock+0x34/0x60 [] .kvmppc_core_vcpu_load_hv+0x2c/0x100 [] .kvmppc_core_vcpu_load+0x2c/0x40 [] .kvm_arch_vcpu_load+0x10/0x30 [] .vcpu_load+0x64/0xd0 [] .kvm_vcpu_ioctl+0x68/0x730 [] .do_vfs_ioctl+0x4dc/0x7a0 [] .SyS_ioctl+0xc4/0xe0 [] syscall_exit+0x0/0x98 Some users have reported this deadlock occurring in practice, though the reports have been primarily on 3.10.x-based kernels. This fixes the problem by making tbacct_lock be irq-safe. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 072287f..31d9cfb 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -131,8 +131,9 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcore *vc = vcpu->arch.vcore; + unsigned long flags; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irqsave(&vcpu->arch.tbacct_lock, flags); if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE && vc->preempt_tb != TB_NIL) { vc->stolen_tb += mftb() - vc->preempt_tb; @@ -143,19 +144,20 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu) vcpu->arch.busy_stolen += mftb() - vcpu->arch.busy_preempt; vcpu->arch.busy_preempt = TB_NIL; } - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags); } static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu) { struct kvmppc_vcore *vc = vcpu->arch.vcore; + unsigned long flags; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irqsave(&vcpu->arch.tbacct_lock, flags); if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE) vc->preempt_tb = mftb(); if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST) vcpu->arch.busy_preempt = mftb(); - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags); } static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) @@ -486,11 +488,11 @@ static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 now) */ if (vc->vcore_state != VCORE_INACTIVE && vc->runner->arch.run_task != current) { - spin_lock(&vc->runner->arch.tbacct_lock); + spin_lock_irq(&vc->runner->arch.tbacct_lock); p = vc->stolen_tb; if (vc->preempt_tb != TB_NIL) p += now - vc->preempt_tb; - spin_unlock(&vc->runner->arch.tbacct_lock); + spin_unlock_irq(&vc->runner->arch.tbacct_lock); } else { p = vc->stolen_tb; } @@ -512,10 +514,10 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu, core_stolen = vcore_stolen_time(vc, now); stolen = core_stolen - vcpu->arch.stolen_logged; vcpu->arch.stolen_logged = core_stolen; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irq(&vcpu->arch.tbacct_lock); stolen += vcpu->arch.busy_stolen; vcpu->arch.busy_stolen = 0; - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irq(&vcpu->arch.t
[PULL 04/12] KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
From: Paul Mackerras Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic: === [ INFO: suspicious RCU usage. ] 3.12.0-rc5-kvm+ #9 Not tainted --- include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-ppc/4831: stack backtrace: CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9 Call Trace: [c00be462b2a0] [c001644c] .show_stack+0x7c/0x1f0 (unreliable) [c00be462b370] [c0ad57c0] .dump_stack+0x88/0xb4 [c00be462b3f0] [c01315e8] .lockdep_rcu_suspicious+0x138/0x180 [c00be462b480] [c007862c] .gfn_to_memslot+0x13c/0x170 [c00be462b510] [c007d384] .gfn_to_hva_prot+0x24/0x90 [c00be462b5a0] [c007d420] .kvm_read_guest_page+0x30/0xd0 [c00be462b630] [c007d528] .kvm_read_guest+0x68/0x110 [c00be462b6e0] [c0084594] .kvmppc_rtas_hcall+0x34/0x180 [c00be462b7d0] [c0097934] .kvmppc_pseries_do_hcall+0x74/0x830 [c00be462b880] [c00990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0 [c00be462b9e0] [c00839cc] .kvmppc_vcpu_run+0x2c/0x40 [c00be462ba50] [c00810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c00be462bae0] [c007b508] .kvm_vcpu_ioctl+0x478/0x730 [c00be462bca0] [c025532c] .do_vfs_ioctl+0x4dc/0x7a0 [c00be462bd80] [c02556b4] .SyS_ioctl+0xc4/0xe0 [c00be462be30] [c0009ee4] syscall_exit+0x0/0x98 To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall() call. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 31d9cfb..b51d5db 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -591,7 +591,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (list_empty(&vcpu->kvm->arch.rtas_tokens)) return RESUME_HOST; + idx = srcu_read_lock(&vcpu->kvm->srcu); rc = kvmppc_rtas_hcall(vcpu); + srcu_read_unlock(&vcpu->kvm->srcu, idx); if (rc == -ENOENT) return RESUME_HOST; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 12/12] KVM: PPC: Book3S HV: Don't drop low-order page address bits
From: Paul Mackerras Commit caaa4c804fae ("KVM: PPC: Book3S HV: Fix physical address calculations") unfortunately resulted in some low-order address bits getting dropped in the case where the guest is creating a 4k HPTE and the host page size is 64k. By getting the low-order bits from hva rather than gpa we miss out on bits 12 - 15 in this case, since hva is at page granularity. This puts the missing bits back in. Reported-by: Alexey Kardashevskiy Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 1931aa3..8689e2e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -240,6 +240,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, is_io = hpte_cache_bits(pte_val(pte)); pa = pte_pfn(pte) << PAGE_SHIFT; pa |= hva & (pte_size - 1); + pa |= gpa & ~PAGE_MASK; } } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 05/12] powerpc: kvm: fix rare but potential deadlock scene
From: pingfan liu Since kvmppc_hv_find_lock_hpte() is called from both virtmode and realmode, so it can trigger the deadlock. Suppose the following scene: Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus. If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in realmode for a long time. What makes things even worse if the following happens, On cpuM, bitlockX is hold, on cpuN, Y is hold. vcpu_B_2 try to lock Y on cpuM in realmode vcpu_A_2 try to lock X on cpuN in realmode Oops! deadlock happens Signed-off-by: Liu Ping Fan Reviewed-by: Paul Mackerras CC: sta...@vger.kernel.org Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 47bbeaf..c5d1484 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -469,11 +469,14 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, slb_v = vcpu->kvm->arch.vrma_slb_v; } + preempt_disable(); /* Find the HPTE in the hash table */ index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v, HPTE_V_VALID | HPTE_V_ABSENT); - if (index < 0) + if (index < 0) { + preempt_enable(); return -ENOENT; + } hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4)); v = hptep[0] & ~HPTE_V_HVLOCK; gr = kvm->arch.revmap[index].guest_rpte; @@ -481,6 +484,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, /* Unlock the HPTE */ asm volatile("lwsync" : : : "memory"); hptep[0] = v; + preempt_enable(); gpte->eaddr = eaddr; gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index fddbf98..1931aa3 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -749,6 +749,10 @@ static int slb_base_page_shift[4] = { 20, /* 1M, unsupported */ }; +/* When called from virtmode, this func should be protected by + * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK + * can trigger deadlock issue. + */ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, unsigned long valid) { -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 00/12] 3.13 patch queue 2013-12-18 for 3.13
Hi Paolo / Gleb, This is my current patch queue for 3.13. It fixes some grave issues we've only found after 3.13-rc1: - Make the modularized HV/PR book3s kvm work well as modules - Fix some race conditions - Fix compilation with certain compilers (booke) - Fix THP for book3s_hv - Fix preemption for book3s_pr Please pull. Alex The following changes since commit f080480488028bcc25357f85e8ae54ccc3bb7173: Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (2013-11-15 13:51:36 +0900) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-for-3.13 for you to fetch changes up to df9059bb64023da9f27e56a94a3e2b8f4b6336a9: KVM: PPC: Book3S HV: Don't drop low-order page address bits (2013-12-18 11:30:35 +0100) Patch queue for 3.13 - 2013-12-18 This fixes some grave issues we've only found after 3.13-rc1: - Make the modularized HV/PR book3s kvm work well as modules - Fix some race conditions - Fix compilation with certain compilers (booke) - Fix THP for book3s_hv - Fix preemption for book3s_pr Alexander Graf (4): KVM: PPC: Book3S: PR: Don't clobber our exit handler id KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy KVM: PPC: Book3S: PR: Enable interrupts earlier Aneesh Kumar K.V (1): powerpc: book3s: kvm: Don't abuse host r2 in exit path Paul Mackerras (5): KVM: PPC: Book3S HV: Fix physical address calculations KVM: PPC: Book3S HV: Refine barriers in guest entry/exit KVM: PPC: Book3S HV: Make tbacct_lock irq-safe KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call KVM: PPC: Book3S HV: Don't drop low-order page address bits Scott Wood (1): powerpc/kvm/booke: Fix build break due to stack frame size warning pingfan liu (1): powerpc: kvm: fix rare but potential deadlock scene Alexander Graf (4): KVM: PPC: Book3S: PR: Don't clobber our exit handler id KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy KVM: PPC: Book3S: PR: Enable interrupts earlier Aneesh Kumar K.V (1): powerpc: book3s: kvm: Don't abuse host r2 in exit path Paul Mackerras (5): KVM: PPC: Book3S HV: Fix physical address calculations KVM: PPC: Book3S HV: Refine barriers in guest entry/exit KVM: PPC: Book3S HV: Make tbacct_lock irq-safe KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call KVM: PPC: Book3S HV: Don't drop low-order page address bits Scott Wood (1): powerpc/kvm/booke: Fix build break due to stack frame size warning pingfan liu (1): powerpc: kvm: fix rare but potential deadlock scene arch/powerpc/include/asm/kvm_book3s.h | 4 arch/powerpc/include/asm/kvm_book3s_asm.h | 2 ++ arch/powerpc/include/asm/switch_to.h | 2 +- arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/process.c | 32 +++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 18 + arch/powerpc/kvm/book3s_hv.c | 24 +-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 -- arch/powerpc/kvm/book3s_interrupts.S | 19 ++ arch/powerpc/kvm/book3s_pr.c | 22 + arch/powerpc/kvm/book3s_rmhandlers.S | 6 +- arch/powerpc/kvm/booke.c | 12 ++-- 13 files changed, 112 insertions(+), 62 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 09/12] KVM: PPC: Book3S: PR: Enable interrupts earlier
Now that the svcpu sync is interrupt aware we can enable interrupts earlier in the exit code path again, moving 32bit and 64bit closer together. While at it, document the fact that we're always executing the exit path with interrupts enabled so that the next person doesn't trap over this. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_interrupts.S | 6 +- arch/powerpc/kvm/book3s_rmhandlers.S | 6 +- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index 5e7cb32..f779450 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -129,6 +129,7 @@ kvm_start_lightweight: * R12 = exit handler id * R13 = PACA * SVCPU.* = guest * +* MSR.EE = 1 * */ @@ -148,11 +149,6 @@ kvm_start_lightweight: nop #ifdef CONFIG_PPC_BOOK3S_64 - /* Re-enable interrupts */ - ld r3, HSTATE_HOST_MSR(r13) - ori r3, r3, MSR_EE - MTMSR_EERI(r3) - /* * Reload kernel SPRG3 value. * No need to save guest value as usermode can't modify SPRG3. diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index a38c4c9..c3c5231 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -153,15 +153,11 @@ _GLOBAL(kvmppc_entry_trampoline) li r6, MSR_IR | MSR_DR andcr6, r5, r6 /* Clear DR and IR in MSR value */ -#ifdef CONFIG_PPC_BOOK3S_32 /* * Set EE in HOST_MSR so that it's enabled when we get into our -* C exit handler function. On 64-bit we delay enabling -* interrupts until we have finished transferring stuff -* to or from the PACA. +* C exit handler function. */ ori r5, r5, MSR_EE -#endif mtsrr0 r7 mtsrr1 r6 RFI -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 11/12] powerpc: book3s: kvm: Don't abuse host r2 in exit path
From: "Aneesh Kumar K.V" We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0x8310 Faulting instruction address: 0xc001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c001dc53aef0] pc: c001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c001d760: .vtime_account_system+0x20/0x60 sp: c001dc53b170 msr: 80009032 dar: 8310 dsisr: 4000 current = 0xc001d76c62d0 paca= 0xcfef1100 softe: 0irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c001dc53b200] c001d760 .vtime_account_system+0x20/0x60 [c001dc53b290] c008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c001dc53b340] c008f51c kvm_start_lightweight+0xb4/0xc4 [c001dc53b510] c008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c001dc53b9e0] c008341c .kvmppc_vcpu_run+0x2c/0x40 [c001dc53ba50] c0080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c001dc53bae0] c007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c001dc53bca0] c02140cc .do_vfs_ioctl+0x4ac/0x770 [c001dc53bd80] c02143e8 .SyS_ioctl+0x58/0xb0 [c001dc53be30] c0009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_asm.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 7 +++ 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 412b2f3..192917d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -79,6 +79,7 @@ struct kvmppc_host_state { ulong vmhandler; ulong scratch0; ulong scratch1; + ulong scratch2; u8 in_guest; u8 restore_hid5; u8 napping; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 2ea5cc0..d3de010 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -576,6 +576,7 @@ int main(void) HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler); HSTATE_FIELD(HSTATE_SCRATCH0, scratch0); HSTATE_FIELD(HSTATE_SCRATCH1, scratch1); + HSTATE_FIELD(HSTATE_SCRATCH2, scratch2); HSTATE_FIELD(HSTATE_IN_GUEST, in_guest); HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5); HSTATE_FIELD(HSTATE_NAPPING, napping); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bde28da..be4fa04a 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -754,15 +754,14 @@ kvmppc_interrupt_hv: * guest CR, R12 saved in shadow VCPU SCRATCH1/0 * guest R13 saved in SPRN_SCRATCH0 */ - /* abuse host_r2 as third scratch area; we get r2 from PACATOC(r13) */ - std r9, HSTATE_HOST_R2(r13) + std r9, HSTATE_SCRATCH2(r13) lbz r9, HSTATE_IN_GUEST(r13) cmpwi r9, KVM_GUEST_MODE_HOST_HV beq kvmppc_bad_host_intr #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE cmpwi r9, KVM_GUEST_MODE_GUEST - ld r9, HSTATE_HOST_R2(r13) + ld r9, HSTATE_SCRATCH2(r13) beq kvmppc_interrupt_pr #endif /* We're now back in the host but in guest MMU context */ @@ -782,7 +781,7 @@ kvmppc_interrupt_hv: std r6, VCPU_GPR(R6)(r9) std r7, VCPU_GPR(R7)(r9) std r8, VCPU_GPR(R8)(r9) - ld r0, HSTATE_HOST_R2(r13) + ld r0, HSTATE_SCRATCH2(r13) std r0, VCPU_GPR(R9)(r9) std r10, VCPU_GPR(R10)(r9) std r11, VCPU_GPR(R11)(r9) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 07/12] KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
The kvmppc_copy_{to,from}_svcpu functions are publically visible, so we should also export them in a header for others C files to consume. So far we didn't need this because we only called it from asm code. The next patch will introduce a C caller. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 4a594b7..bc23b1b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -192,6 +192,10 @@ extern void kvmppc_load_up_vsx(void); extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst); extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst); extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd); +extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, +struct kvm_vcpu *vcpu); +extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, + struct kvmppc_book3s_shadow_vcpu *svcpu); static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) { -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 01/12] KVM: PPC: Book3S HV: Fix physical address calculations
From: Paul Mackerras This fixes a bug in kvmppc_do_h_enter() where the physical address for a page can be calculated incorrectly if transparent huge pages (THP) are active. Until THP came along, it was true that if we encountered a large (16M) page in kvmppc_do_h_enter(), then the associated memslot must be 16M aligned for both its guest physical address and the userspace address, and the physical address calculations in kvmppc_do_h_enter() assumed that. With THP, that is no longer true. In the case where we are using MMU notifiers and the page size that we get from the Linux page tables is larger than the page being mapped by the guest, we need to fill in some low-order bits of the physical address. Without THP, these bits would be the same in the guest physical address (gpa) and the host virtual address (hva). With THP, they can be different, and we need to use the bits from hva rather than gpa. In the case where we are not using MMU notifiers, the host physical address we get from the memslot->arch.slot_phys[] array already includes the low-order bits down to the PAGE_SIZE level, even if we are using large pages. Thus we can simplify the calculation in this case to just add in the remaining bits in the case where PAGE_SIZE is 64k and the guest is mapping a 4k page. The same bug exists in kvmppc_book3s_hv_page_fault(). The basic fix is to use psize (the page size from the HPTE) rather than pte_size (the page size from the Linux PTE) when updating the HPTE low word in r. That means that pfn needs to be computed to PAGE_SIZE granularity even if the Linux PTE is a huge page PTE. That can be arranged simply by doing the page_to_pfn() before setting page to the head of the compound page. If psize is less than PAGE_SIZE, then we need to make sure we only update the bits from PAGE_SIZE upwards, in order not to lose any sub-page offset bits in r. On the other hand, if psize is greater than PAGE_SIZE, we need to make sure we don't bring in non-zero low order bits in pfn, hence we mask (pfn << PAGE_SHIFT) with ~(psize - 1). Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 +--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 ++-- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f3ff587..47bbeaf 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -665,6 +665,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return -EFAULT; } else { page = pages[0]; + pfn = page_to_pfn(page); if (PageHuge(page)) { page = compound_head(page); pte_size <<= compound_order(page); @@ -689,7 +690,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } rcu_read_unlock_sched(); } - pfn = page_to_pfn(page); } ret = -EFAULT; @@ -707,8 +707,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, r = (r & ~(HPTE_R_W|HPTE_R_I|HPTE_R_G)) | HPTE_R_M; } - /* Set the HPTE to point to pfn */ - r = (r & ~(HPTE_R_PP0 - pte_size)) | (pfn << PAGE_SHIFT); + /* +* Set the HPTE to point to pfn. +* Since the pfn is at PAGE_SIZE granularity, make sure we +* don't mask out lower-order bits if psize < PAGE_SIZE. +*/ + if (psize < PAGE_SIZE) + psize = PAGE_SIZE; + r = (r & ~(HPTE_R_PP0 - psize)) | ((pfn << PAGE_SHIFT) & ~(psize - 1)); if (hpte_is_writable(r) && !write_ok) r = hpte_make_readonly(r); ret = RESUME_GUEST; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 9c51544..fddbf98 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -225,6 +225,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, is_io = pa & (HPTE_R_I | HPTE_R_W); pte_size = PAGE_SIZE << (pa & KVMPPC_PAGE_ORDER_MASK); pa &= PAGE_MASK; + pa |= gpa & ~PAGE_MASK; } else { /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); @@ -238,13 +239,12 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, ptel = hpte_make_readonly(ptel); is_io = hpte_cache_bits(pte_val(pte)); pa = pte_pfn(pte) << PAGE_SHIFT; + pa |= hva & (pte_size - 1); } } if (pte_size < psize) return H_PARAMETER; - if (pa && pte_size > psize) - pa |= gpa & (pte_size - 1); ptel
[PULL 10/12] powerpc/kvm/booke: Fix build break due to stack frame size warning
From: Scott Wood Commit ce11e48b7fdd256ec68b932a89b397a790566031 ("KVM: PPC: E500: Add userspace debug stub support") added "struct thread_struct" to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced "struct debug_reg". Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs Signed-off-by: Scott Wood Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/switch_to.h | 2 +- arch/powerpc/kernel/process.c| 32 arch/powerpc/kvm/booke.c | 12 ++-- 3 files changed, 23 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 9ee1261..aace905 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -35,7 +35,7 @@ extern void giveup_vsx(struct task_struct *); extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); -extern void switch_booke_debug_regs(struct thread_struct *new_thread); +extern void switch_booke_debug_regs(struct debug_reg *new_debug); #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 75c2d10..83530af 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -339,7 +339,7 @@ static void set_debug_reg_defaults(struct thread_struct *thread) #endif } -static void prime_debug_regs(struct thread_struct *thread) +static void prime_debug_regs(struct debug_reg *debug) { /* * We could have inherited MSR_DE from userspace, since @@ -348,22 +348,22 @@ static void prime_debug_regs(struct thread_struct *thread) */ mtmsr(mfmsr() & ~MSR_DE); - mtspr(SPRN_IAC1, thread->debug.iac1); - mtspr(SPRN_IAC2, thread->debug.iac2); + mtspr(SPRN_IAC1, debug->iac1); + mtspr(SPRN_IAC2, debug->iac2); #if CONFIG_PPC_ADV_DEBUG_IACS > 2 - mtspr(SPRN_IAC3, thread->debug.iac3); - mtspr(SPRN_IAC4, thread->debug.iac4); + mtspr(SPRN_IAC3, debug->iac3); + mtspr(SPRN_IAC4, debug->iac4); #endif - mtspr(SPRN_DAC1, thread->debug.dac1); - mtspr(SPRN_DAC2, thread->debug.dac2); + mtspr(SPRN_DAC1, debug->dac1); + mtspr(SPRN_DAC2, debug->dac2); #if CONFIG_PPC_ADV_DEBUG_DVCS > 0 - mtspr(SPRN_DVC1, thread->debug.dvc1); - mtspr(SPRN_DVC2, thread->debug.dvc2); + mtspr(SPRN_DVC1, debug->dvc1); + mtspr(SPRN_DVC2, debug->dvc2); #endif - mtspr(SPRN_DBCR0, thread->debug.dbcr0); - mtspr(SPRN_DBCR1, thread->debug.dbcr1); + mtspr(SPRN_DBCR0, debug->dbcr0); + mtspr(SPRN_DBCR1, debug->dbcr1); #ifdef CONFIG_BOOKE - mtspr(SPRN_DBCR2, thread->debug.dbcr2); + mtspr(SPRN_DBCR2, debug->dbcr2); #endif } /* @@ -371,11 +371,11 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct debug_reg *new_debug) { if ((current->thread.debug.dbcr0 & DBCR0_IDM) - || (new_thread->debug.dbcr0 & DBCR0_IDM)) - prime_debug_regs(new_thread); + || (new_debug->dbcr0 & DBCR0_IDM)) + prime_debug_regs(new_debug); } EXPORT_SYMBOL_GPL(switch_booke_debug_regs); #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ @@ -683,7 +683,7 @@ struct task_struct *__switch_to(struct task_struct *prev, #endif /* CONFIG_SMP */ #ifdef CONFIG_PPC_ADV_DEBUG_REGS - switch_booke_debug_regs(&new->thread); + switch_booke_debug_regs(&new->thread.debug); #else /* * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 53e65a2..0591e05 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -681,7 +681,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; - struct thread_struct thread; + struct debug_reg debug; #ifdef CONFIG_PPC_FPU struct thread_fp_state fp; int fpexc_mode; @@ -723,9 +723,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #endif /* Switch to guest debug context */ - thread.debug = vcpu->arch.shadow_dbg_reg; - switch_booke_debug_regs(&thread); - thread.debug = current->thread.debug; + debug = vc
[PULL 06/12] KVM: PPC: Book3S: PR: Don't clobber our exit handler id
We call a C helper to save all svcpu fields into our vcpu. The C ABI states that r12 is considered volatile. However, we keep our exit handler id in r12 currently. So we need to save it away into a non-volatile register instead that definitely does get preserved across the C call. This bug usually didn't hit anyone yet since gcc is smart enough to generate code that doesn't even need r12 which means it stayed identical throughout the call by sheer luck. But we can't rely on that. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_interrupts.S | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index f4dd041..5e7cb32 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -132,9 +132,17 @@ kvm_start_lightweight: * */ + PPC_LL r3, GPR4(r1)/* vcpu pointer */ + + /* +* kvmppc_copy_from_svcpu can clobber volatile registers, save +* the exit handler id to the vcpu and restore it from there later. +*/ + stw r12, VCPU_TRAP(r3) + /* Transfer reg values from shadow vcpu back to vcpu struct */ /* On 64-bit, interrupts are still off at this point */ - PPC_LL r3, GPR4(r1)/* vcpu pointer */ + GET_SHADOW_VCPU(r4) bl FUNC(kvmppc_copy_from_svcpu) nop @@ -151,7 +159,6 @@ kvm_start_lightweight: */ ld r3, PACA_SPRG3(r13) mtspr SPRN_SPRG3, r3 - #endif /* CONFIG_PPC_BOOK3S_64 */ /* R7 = vcpu */ @@ -177,7 +184,7 @@ kvm_start_lightweight: PPC_STL r31, VCPU_GPR(R31)(r7) /* Pass the exit number as 3rd argument to kvmppc_handle_exit */ - mr r5, r12 + lwz r5, VCPU_TRAP(r7) /* Restore r3 (kvm_run) and r4 (vcpu) */ REST_2GPRS(3, r1) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 08/12] KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
As soon as we get back to our "highmem" handler in virtual address space we may get preempted. Today the reason we can get preempted is that we replay interrupts and all the lazy logic thinks we have interrupts enabled. However, it's not hard to make the code interruptible and that way we can enable and handle interrupts even earlier. This fixes random guest crashes that happened with CONFIG_PREEMPT=y for me. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_asm.h | 1 + arch/powerpc/kvm/book3s_pr.c | 22 ++ 2 files changed, 23 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 0bd9348..412b2f3 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -106,6 +106,7 @@ struct kvmppc_host_state { }; struct kvmppc_book3s_shadow_vcpu { + bool in_use; ulong gpr[14]; u32 cr; u32 xer; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index fe14ca3..5b9e906 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -66,6 +66,7 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu) struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu); memcpy(svcpu->slb, to_book3s(vcpu)->slb_shadow, sizeof(svcpu->slb)); svcpu->slb_max = to_book3s(vcpu)->slb_shadow_max; + svcpu->in_use = 0; svcpu_put(svcpu); #endif vcpu->cpu = smp_processor_id(); @@ -78,6 +79,9 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) { #ifdef CONFIG_PPC_BOOK3S_64 struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu); + if (svcpu->in_use) { + kvmppc_copy_from_svcpu(vcpu, svcpu); + } memcpy(to_book3s(vcpu)->slb_shadow, svcpu->slb, sizeof(svcpu->slb)); to_book3s(vcpu)->slb_shadow_max = svcpu->slb_max; svcpu_put(svcpu); @@ -110,12 +114,26 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, svcpu->ctr = vcpu->arch.ctr; svcpu->lr = vcpu->arch.lr; svcpu->pc = vcpu->arch.pc; + svcpu->in_use = true; } /* Copy data touched by real-mode code from shadow vcpu back to vcpu */ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, struct kvmppc_book3s_shadow_vcpu *svcpu) { + /* +* vcpu_put would just call us again because in_use hasn't +* been updated yet. +*/ + preempt_disable(); + + /* +* Maybe we were already preempted and synced the svcpu from +* our preempt notifiers. Don't bother touching this svcpu then. +*/ + if (!svcpu->in_use) + goto out; + vcpu->arch.gpr[0] = svcpu->gpr[0]; vcpu->arch.gpr[1] = svcpu->gpr[1]; vcpu->arch.gpr[2] = svcpu->gpr[2]; @@ -139,6 +157,10 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, vcpu->arch.fault_dar = svcpu->fault_dar; vcpu->arch.fault_dsisr = svcpu->fault_dsisr; vcpu->arch.last_inst = svcpu->last_inst; + svcpu->in_use = false; + +out: + preempt_enable(); } static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 67061] [BISECTED][Nested]L2 guest boot up fail(kvm on kvm).
https://bugzilla.kernel.org/show_bug.cgi?id=67061 Alan changed: What|Removed |Added CC||a...@lxorguk.ukuu.org.uk Summary|[Nested]L2 guest boot up|[BISECTED][Nested]L2 guest |fail(kvm on kvm). |boot up fail(kvm on kvm). --- Comment #2 from Alan --- Thats a very odd bisection point it must be said -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvmtool/arm: Add option to override Generic Timer frequency
On 18.12.2013, at 14:44, Robin Murphy wrote: > On 17/12/13 20:39, Alexander Graf wrote: >> >> On 17.12.2013, at 19:31, Robin Murphy wrote: >> >>> Some platforms have secure firmware which does not correctly set the >>> CNTFRQ register on boot, preventing the use of the Generic Timer. >>> This patch allows mirroring the necessary host workaround by specifying >>> the clock-frequency property in the guest DT. >>> >>> This should only be considered a means of KVM bring-up on such systems, >>> such that vendors may be convinced to properly implement their firmware >>> to support the virtualisation capabilities of their hardware. >>> >>> Signed-off-by: Robin Murphy >>> Acked-by: Will Deacon >> >> How does it encourage a vendor to properly implement their firmware if >> there's a workaround? >> >> >> Alex >> >> > > Hi Alex, > > In short, by enabling the users to create the demand. Yes, like any > workaround there's potential for abuse, but having *something* that makes it > work is the difference between "I want virtualisation"[1] and "Dear vendor, > I've tried virtualisation on your chip/board and it's great, but it tells me > I need new firmware, where do I get that?" > > Having the specs tell them what to do clearly isn't sufficient, so let's give > the integrators and consumers incentive to shout at them too. The sooner > proper support is commonplace and we can deprecate clock-frequency hacks > altogether, the better. Oh, I'm all for hacks. But please don't fall under the illusion that this will push vendors to fix their firmware. It will have the opposite effect - vendors will just point to the workaround and say "but it works" :). Either way, this hack is basically required because you can't program CNTFRQ because it's controlled by secure firmware, right? So the host os already needs to know about this and probably does have a "clock-frequency" value in its device tree entry already to know how fast its clock ticks. Couldn't we search for that host entry and automatically pass it into the guest if it's there? That way the whole thing becomes seamless and even less of an issue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvmtool/arm: Add option to override Generic Timer frequency
On 17/12/13 20:39, Alexander Graf wrote: On 17.12.2013, at 19:31, Robin Murphy wrote: Some platforms have secure firmware which does not correctly set the CNTFRQ register on boot, preventing the use of the Generic Timer. This patch allows mirroring the necessary host workaround by specifying the clock-frequency property in the guest DT. This should only be considered a means of KVM bring-up on such systems, such that vendors may be convinced to properly implement their firmware to support the virtualisation capabilities of their hardware. Signed-off-by: Robin Murphy Acked-by: Will Deacon How does it encourage a vendor to properly implement their firmware if there's a workaround? Alex Hi Alex, In short, by enabling the users to create the demand. Yes, like any workaround there's potential for abuse, but having *something* that makes it work is the difference between "I want virtualisation"[1] and "Dear vendor, I've tried virtualisation on your chip/board and it's great, but it tells me I need new firmware, where do I get that?" Having the specs tell them what to do clearly isn't sufficient, so let's give the integrators and consumers incentive to shout at them too. The sooner proper support is commonplace and we can deprecate clock-frequency hacks altogether, the better. Robin. [1] http://www.theregister.co.uk/2013/12/12/virtualisation_on_mobile_phones_is_coming/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Updated Elvis Upstreaming Roadmap
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote: > Hi, > > Thank you all for your comments. > I'm sorry for taking this long to reply, I was away on vacation.. > > It was a good, long discussion, many issues were raised, which we'd like > to address with the following proposed roadmap for Elvis patches. > In general, we believe it would be best to start with patches that are > as simple as possible, providing the basic Elvis functionality, > and attend to the more complicated issues in subsequent patches. > > Here's the road map for Elvis patches: Thanks for the follow up. Some suggestions below. Please note they suggestions below merely represent thoughts on merging upstream. If as the first step you are content with keeping this work as out of tree patches, in order to have the freedom to experiment with interfaces and performance, please feel free to ignore them. > 1. Shared vhost thread for multiple devices. > > The way to go here, we believe, is to start with a patch having a shared > vhost thread for multiple devices of the SAME vm. > The next step/patch may be handling vms belonging to the same cgroup. > > Finally, we need to extend the functionality so that the shared vhost > thread > serves multiple vms (not necessarily belonging to the same cgroup). > > There was a lot of discussion about the way to address the enforcement > of cgroup policies, and we will consider the various solutions with a > future > patch. With respect to the upstream kernel, I'm not sure a bunch of changes just for the sake of guests with multiple virtual NIC cards makes sense. And I wonder how this step, in isolation, will affect e.g. multiqueue workloads. But I guess if the numbers are convincing, this can be mergeable. > > 2. Creation of vhost threads > > We suggested two ways of controlling the creation and removal of vhost > threads: > - statically determining the maximum number of virtio devices per worker > via a kernel module parameter > - dynamically: Sysfs mechanism to add and remove vhost threads > > It seems that it would be simplest to take the static approach as > a first stage. At a second stage (next patch), we'll advance to > dynamically > changing the number of vhost threads, using the static module parameter > only as a default value. I'm not sure how independent this is from 1. With respect to the upstream kernel, Introducing interfaces (which we'll have to maintain forever) just for the sake of guests with multiple virtual NIC cards does not look like a good tradeoff. So I'm unlikely to merge this upstream without making it useful cross-VM, and yes this means isolation and accounting with cgroups need to work properly. > Regarding cwmq, it is an interesting mechanism, which we need to explore > further. > At the moment we prefer not to change the vhost model to use cwmq, as some > of the issues that were discussed, such as cgroups, are not supported by > cwmq, and this is adding more complexity. > However, we'll look further into it, and consider it at a later stage. Hmm that's still assuming some smart management tool configuring this correctly. Can't this be determined automatically depending on the workload? This is what the cwmq suggestion was really about: detect that we need more threads and spawn them. It's less about sharing the implementation with workqueues - would be very nice but not a must. > 3. Adding polling mode to vhost > > It is a good idea making polling adaptive based on various factors such as > the I/O rate, the guest kick overhead(which is the tradeoff of polling), > or the amount of wasted cycles (cycles we kept polling but no new work was > added). > However, as a beginning polling patch, we would prefer having a naive > polling approach, which could be tuned with later patches. > While any polling approach would still need a lot of testing to prove we don't for example steal CPU from guest which could be doing other useful work, given that an exit is at least 1.5K cycles at least in theory it seems like something that can improve performance. I'm not sure how naive we can be without introducing regressions for some workloads. For example, if we are on the same host CPU, there's no chance busy waiting will help us make progress. How about detecting that the VCPU thread that kicked us is currently running on another CPU, and only polling in this case? > 4. vhost statistics > > The issue that was raised for the vhost statistics was using ftrace > instead of the debugfs mechanism. > However, looking further into the kvm stat mechanism, we learned that > ftrace didn't replace the plain debugfs mechanism, but was used in > addition to it. > > We propose to continue using debugfs for statistics, in a manner similar > to kvm, > and at some point in the future ftrace can be added to vhost as well. IMHO which kvm stat is a useful script, the best tool for perf stats is still perf. So I would try to integrate with
Re: [PATCH] KVM: PPC: Book3S HV: Don't drop low-order page address bits
On 16.12.2013, at 03:31, Paul Mackerras wrote: > Commit caaa4c804fae ("KVM: PPC: Book3S HV: Fix physical address > calculations") unfortunately resulted in some low-order address bits > getting dropped in the case where the guest is creating a 4k HPTE > and the host page size is 64k. By getting the low-order bits from > hva rather than gpa we miss out on bits 12 - 15 in this case, since > hva is at page granularity. This puts the missing bits back in. > > Reported-by: Alexey Kardashevskiy > Signed-off-by: Paul Mackerras Thanks, applied to for-3.13. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc: book3s: kvm: Don't abuse host r2 in exit path
On 18.12.2013, at 08:35, Aneesh Kumar K.V wrote: > > Hi Alex, > > Any update on this ? We need this to got into 3.13. Thanks, applied to for-3.13. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/15] kvm-unit-tests: Fix and enhance nVMX tests
Il 16/12/2013 10:57, Jan Kiszka ha scritto: > This series contains various smaller fixes and cleanups for the VMX unit > tests and also a few unrelated corners of the test suite. It ends with a > rework of the VMX capability test and the addition of a bunch of new > test cases for interrupt injection, halting and the preemption timer. > Most of them fail or lockup at the moment. More work is required on KVM > the fix this, I'm on it. > > These apply on top of the vmx branch. They are also available at > > git://git.kiszka.org/kvm-unit-tests queue I finally merged the vmx branch into master, and applied 1-13 on top. I'm in the process of installing newer hardware, and for now I'm stuck with an old Core 2. So I'm holding up patches 14-15 until after I get the new machine; I pushed them to the vmx branch. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Support direct APIC access from L2
Il 16/12/2013 12:55, Jan Kiszka ha scritto: > It's a pathological case, but still a valid one: If L1 disables APIC > virtualization and also allows L2 to directly write to the APIC page, we > have to forcibly enable APIC virtualization while in L2 if the in-kernel > APIC is in use. > > This allows to run the direct interrupt test case in the vmx unit test > without x2APIC. > > Signed-off-by: Jan Kiszka > --- > arch/x86/kvm/vmx.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index da04247..7c3ea63 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -7771,6 +7771,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > else > vmcs_write64(APIC_ACCESS_ADDR, > page_to_phys(vmx->nested.apic_access_page)); > + } else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) { > + exec_control |= > + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; > + vmcs_write64(APIC_ACCESS_ADDR, > + page_to_phys(vcpu->kvm->arch.apic_access_page)); > } > > vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control); > Applied, thanks. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH uq/master] kvm: x86: Separately write feature control MSR on reset
Il 17/12/2013 20:05, Jan Kiszka ha scritto: > If the guest is running in nested mode on system reset, clearing the > feature MSR signals the kernel to leave this mode. Recent kernels > processes this properly, but leave the VCPU state undefined behind. It > is the job of userspace to bring it to a proper shape. Therefore, write > this specific MSR first so that no state transfer gets lost. > > This allows to cleanly reset a guest with VMX in use. > > Signed-off-by: Jan Kiszka > --- > target-i386/kvm.c | 32 > 1 file changed, 28 insertions(+), 4 deletions(-) > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c > index 1188482..ec51447 100644 > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -1104,6 +1104,25 @@ static int kvm_put_tscdeadline_msr(X86CPU *cpu) > return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, &msr_data); > } > > +/* > + * Provide a separate write service for the feature control MSR in order to > + * kick the VCPU out of VMXON or even guest mode on reset. This has to be > done > + * before writing any other state because forcibly leaving nested mode > + * invalidates the VCPU state. > + */ > +static int kvm_put_msr_feature_control(X86CPU *cpu) > +{ > +struct { > +struct kvm_msrs info; > +struct kvm_msr_entry entry; > +} msr_data; > + > +kvm_msr_entry_set(&msr_data.entry, MSR_IA32_FEATURE_CONTROL, > + cpu->env.msr_ia32_feature_control); > +msr_data.info.nmsrs = 1; > +return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, &msr_data); > +} > + > static int kvm_put_msrs(X86CPU *cpu, int level) > { > CPUX86State *env = &cpu->env; > @@ -1204,10 +1223,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level) > if (cpu->hyperv_vapic) { > kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0); > } > -if (has_msr_feature_control) { > -kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, > - env->msr_ia32_feature_control); > -} > +/* Note: MSR_IA32_FEATURE_CONTROL is written separately, see > + * kvm_put_msr_feature_control. */ > } > if (env->mcg_cap) { > int i; > @@ -1801,6 +1818,13 @@ int kvm_arch_put_registers(CPUState *cpu, int level) > > assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu)); > > +if (level >= KVM_PUT_RESET_STATE && has_msr_feature_control) { > +ret = kvm_put_msr_feature_control(x86_cpu); > +if (ret < 0) { > +return ret; > +} > +} > + > ret = kvm_getput_regs(x86_cpu, 1); > if (ret < 0) { > return ret; > Applied, thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Leave VMX mode on clearing of feature control MSR
Il 17/12/2013 19:57, Jan Kiszka ha scritto: > When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave > root and non-root mode, fully disabling VMX. The register state of the > VCPU is undefined after this step, so userspace has to set it to a > proper state afterward. > > This enables to reboot a VM while it is running some hypervisor code. > > Signed-off-by: Jan Kiszka > --- > > Even without a QEMU patch, this already enables system reset - the guest > is left in such a broken state that it simply triple-faults and resets > twice. Nevertheless, QEMU patch will follow. > > arch/x86/kvm/vmx.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index f90320b..6a0c2fa 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2455,6 +2455,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 > msr_index, u64 *pdata) > return 1; > } > > +static void vmx_leave_nested(struct kvm_vcpu *vcpu); > + > static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > { > u32 msr_index = msr_info->index; > @@ -2470,6 +2472,8 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, > struct msr_data *msr_info) > & FEATURE_CONTROL_LOCKED) > return 0; > to_vmx(vcpu)->nested.msr_ia32_feature_control = data; > + if (host_initialized && data == 0) > + vmx_leave_nested(vcpu); > return 1; > } > > @@ -8488,6 +8492,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) > } > > /* > + * Forcibly leave nested mode in order to be able to reset the VCPU later on. > + */ > +static void vmx_leave_nested(struct kvm_vcpu *vcpu) > +{ > + if (is_guest_mode(vcpu)) > + nested_vmx_vmexit(vcpu); > + free_nested(to_vmx(vcpu)); > +} > + > +/* > * L1's failure to enter L2 is a subset of a normal exit, as explained in > * 23.7 "VM-entry failures during or after loading guest state" (this also > * lists the acceptable exit-reason and exit-qualification parameters). > Applied, thanks! Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 v3] kvm: powerpc: use cache attributes from linux pte
On 18.11.2013, at 19:36, Scott Wood wrote: > On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote: >> From: Bharat Bhushan >> >> v2->v3 >> - Returning pte pointer rather than pte as suggested by Scott >> - Also removed PAGE_SPLITTING as this need to be handled by caller >> >> v1->v2 >> - Removed _PAGE_BUSY loop as suggested by PaulS. >> - Added check for PAGE_SPLITTING >> >> kvm: powerpc: use cache attributes from linux pte >> - 1st Patch fixes a bug in booke (detail in patch) >> - 2nd patch is renaming the linux_pte_lookup_function() just for clarity. >> There is not functional change. >> - 3nd Patch adds a Linux pte lookup function. >> - 4th Patch uses the above defined function and setup TLB.wimg accordingly >> >> Bharat Bhushan (4): >> kvm: booke: clear host tlb reference flag on guest tlb invalidation >> kvm: book3s: rename lookup_linux_pte() to >>lookup_linux_pte_and_update() >> kvm: powerpc: define a linux pte lookup function >> kvm: powerpc: use caching attributes as per linux pte >> >> arch/powerpc/include/asm/kvm_host.h |2 +- >> arch/powerpc/include/asm/pgtable.h | 21 + >> arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++-- >> arch/powerpc/kvm/booke.c|1 + >> arch/powerpc/kvm/e500.h |8 +++-- >> arch/powerpc/kvm/e500_mmu_host.c| 55 +++--- >> 6 files changed, 64 insertions(+), 31 deletions(-) > > Reviewed-by: Scott Wood with patch 4/4 > replaced by v4. Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html