Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest
> Am 14.09.2015 um 15:17 schrieb Laurentiu Tudor : > >> On 09/10/2015 02:01 AM, Scott Wood wrote: >>> On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote: >>> This way we get rid of an entire file with mostly >>> duplicated code plus a Kconfig option that you always >>> had to take care to check it in order for kvm to work. >>> >>> Signed-off-by: Laurentiu Tudor >>> --- >>> arch/powerpc/platforms/85xx/Kconfig | 15 - >>> arch/powerpc/platforms/85xx/Makefile | 1 - >>> arch/powerpc/platforms/85xx/corenet_generic.c | 1 + >>> arch/powerpc/platforms/85xx/qemu_e500.c | 85 >> >> >> qemu_e500 is not only for corenet chips. > > That's too bad. :-( > I remember discussions on dropping the e500v2 support at some point in time? > >> We can add it to the defconfig (in fact I've been meaning to do so). > > Or maybe just drop de KConfig option and > wrap the file in an #ifdef CONFIG_KVM or something along these lines? CONFIG_KVM is for host support though. This is for the guest kernel. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
> Am 12.09.2015 um 18:47 schrieb Nathan Whitehorn : > >> On 09/06/15 16:52, Paul Mackerras wrote: >>> On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote: >>> Anything I can do to help move these along? It's a big performance >>> improvement for FreeBSD guests. >> These patches are in Paolo's kvm-ppc-next branch and should go into >> Linus' tree in the next couple of days. >> >> Paul. > > One additional question. What is your preferred way to enable these? Since > these are part of the mandatory part of the PAPR spec, I think there's an > argument to add them to the default_hcall_list? Otherwise, they should be > enabled by default in QEMU (I can take care of sending that patch if you > prefer this route). The default hcall list just describes which hcalls were implicitly enabled at the point in time we made them enableable by user space. IMHO no new hcalls should get added there. So yes, please send a patch to qemu :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] KVM memory slots limit on powerpc
On 04.09.15 11:59, Christian Borntraeger wrote: > Am 04.09.2015 um 11:35 schrieb Thomas Huth: >> >> Hi all, >> >> now that we get memory hotplugging for the spapr machine on qemu-ppc, >> too, it seems like we easily can hit the amount of KVM-internal memory >> slots now ("#define KVM_USER_MEM_SLOTS 32" in >> arch/powerpc/include/asm/kvm_host.h). For example, start >> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m >> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and >> you'll see that it aborts way earlier already. >> >> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509 >> already (+3 internal slots = 512) ... maybe we should now increase the >> amount of slots on powerpc, too? Since we don't use internal slots on >> POWER, would 512 be a good value? Or would less be sufficient, too? > > When you are at it, the s390 value should also be increased I guess. That constant defines the array size for the memslot array in struct kvm which in turn again gets allocated by kzalloc, so it's pinned kernel memory that is physically contiguous. Doing big allocations can turn into problems during runtime. So maybe there is another way? Can we extend the memslot array size dynamically somehow? Allocate it separately? How much memory does the memslot array use up with 512 entries? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ppc: Fix size of the PSPB register
> Am 02.09.2015 um 09:26 schrieb Thomas Huth : > >> On 02/09/15 00:55, Benjamin Herrenschmidt wrote: >>> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote: >>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt >>> wrote: On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > The size of the Problem State Priority Boost Register is only > 32 bits, so let's change the type of the corresponding variable > accordingly to avoid future trouble. It's not future trouble, it's broken today for LE and this should fix it BUT >>> >>> No, it's broken today for BE hosts, which will always see 0 for the >>> PSPB register value. LE hosts are fine. > > Right ... I just meant that nobody really experienced trouble with this > today yet, but the bug is already present now already of course. Sounds like a great candidate for kvm-unit-tests then, no? ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio: Enable VFIO device for powerpc
On 13.08.15 03:15, David Gibson wrote: > ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is > used to handle any necessary interactions between KVM and VFIO. > > Currently that device is built on x86 and ARM, but not powerpc, although > powerpc does support both KVM and VFIO. This makes things awkward in > userspace > > Currently qemu prints an alarming error message if you attempt to use VFIO > and it can't initialize the KVM VFIO device. We don't want to remove the > warning, because lack of the KVM VFIO device could mean coherency problems > on x86. On powerpc, however, the error is harmless but looks disturbing, > and a test based on host architecture in qemu would be ugly, and break if > we do need the KVM VFIO device for something important in future. > > There's nothing preventing the KVM VFIO device from being built for > powerpc, so this patch turns it on. It won't actually do anything, since > we don't define any of the arch_*() hooks, but it will make qemu happy and > we can extend it in future if we need to. > > Signed-off-by: David Gibson > Reviewed-by: Eric Auger Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm patches should get CC on the kvm-ppc@vger mailing list ;). Paul, could you please pick this one up? Thanks! Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Build regressions/improvements in v4.2-rc8
On 24.08.15 10:36, Geert Uytterhoeven wrote: > On Mon, Aug 24, 2015 at 10:34 AM, Geert Uytterhoeven > wrote: >> JFYI, when comparing v4.2-rc8[1] to v4.2-rc7[3], the summaries are: >> - build errors: +4/-7 > > 4 regressions: > + /home/kisskb/slave/src/include/linux/kvm_host.h: error: array > subscript is above array bounds [-Werror=array-bounds]: => 430:19 > (arch/powerpc/kvm/book3s_64_mmu.c: In function 'kvmppc_mmu > _book3s_64_tlbie':) > > powerpc-randconfig (seen before in a v3.15-rc1 build?) I'm not quite sure what's going wrong here. The code in question does kvm_for_each_vcpu(i, v, vcpu->kvm) kvmppc_mmu_pte_vflush(v, va >> 12, mask); and IIUC the thing we're potentially running over on would be kvm->vcpus[i]. But that one is bound by the kvm_for_each_vcpu loop, no? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 00/12] ppc patch queue 2015-08-22
On 22.08.15 15:32, Paolo Bonzini wrote: > > > On 22/08/2015 02:21, Alexander Graf wrote: >> Hi Paolo, >> >> This is my current patch queue for ppc. Please pull. > > Done, but this queue has not been in linux-next. Please push to > kvm-ppc-next on your github Linux tree as well; please keep an eye on Ah, sorry. I pushed to kvm-ppc-next in parallel to sending the request. > Steven Rothwell's messages in the next few days, and I'll send the pull > request sometimes next week via webmail if everything goes fine. Nothing exciting came in so far, so I hope we're good :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 02/12] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
From: Thomas Huth Since the PPC970 support has been removed from the kvm-hv kernel module recently, we should also reflect this change in the help text of the corresponding Kconfig option. Signed-off-by: Thomas Huth Signed-off-by: Alexander Graf --- arch/powerpc/kvm/Kconfig | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..c2024ac 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -74,14 +74,14 @@ config KVM_BOOK3S_64 If unsure, say N. config KVM_BOOK3S_64_HV - tristate "KVM support for POWER7 and PPC970 using hypervisor mode in host" + tristate "KVM for POWER7 and later using hypervisor mode in host" depends on KVM_BOOK3S_64 && PPC_POWERNV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA ---help--- Support running unmodified book3s_64 guest kernels in - virtual machines on POWER7 and PPC970 processors that have + virtual machines on POWER7 and newer processors that have hypervisor mode available to the host. If you say Y here, KVM will use the hardware virtualization @@ -89,8 +89,8 @@ config KVM_BOOK3S_64_HV guest operating systems will run at full hardware speed using supervisor and user modes. However, this also means that KVM is not usable under PowerVM (pHyp), is only usable - on POWER7 (or later) processors and PPC970-family processors, - and cannot emulate a different processor from the host processor. + on POWER7 or later processors, and cannot emulate a + different processor from the host processor. If unsure, say N. -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 04/12] KVM: PPC: add missing pt_regs initialization
From: Tudor Laurentiu On this switch branch the regs initialization doesn't happen so add it. This was found with the help of a static code analysis tool. Signed-off-by: Laurentiu Tudor Signed-off-by: Alexander Graf --- arch/powerpc/kvm/booke.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index cc58426..ae458f0 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -933,6 +933,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, #endif break; case BOOKE_INTERRUPT_CRITICAL: + kvmppc_fill_pt_regs(®s); unknown_exception(®s); break; case BOOKE_INTERRUPT_DEBUG: -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 00/12] ppc patch queue 2015-08-22
Hi Paolo, This is my current patch queue for ppc. Please pull. Alex The following changes since commit 4d283ec908e617fa28bcb06bce310206f0655d67: x86/kvm: Rename VMX's segment access rights defines (2015-08-15 00:47:13 +0200) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-next for you to fetch changes up to c63517c2e3810071359af926f621c1f784388c3f: KVM: PPC: Book3S: correct width in XER handling (2015-08-22 11:16:19 +0200) Patch queue for ppc - 2015-08-22 Highlights for KVM PPC this time around: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8 Paul Mackerras (7): KVM: PPC: Book3S HV: Make use of unused threads when running guests KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE KVM: PPC: Book3S HV: Fix bug in dirty page tracking KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD KVM: PPC: Book3S HV: Fix preempted vcore list locking KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation Sam bobroff (1): KVM: PPC: Book3S: correct width in XER handling Thomas Huth (2): KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig KVM: PPC: Fix warnings from sparse Tudor Laurentiu (2): KVM: PPC: fix suspicious use of conditional operator KVM: PPC: add missing pt_regs initialization arch/powerpc/include/asm/kvm_book3s.h | 5 +- arch/powerpc/include/asm/kvm_book3s_asm.h | 22 +- arch/powerpc/include/asm/kvm_booke.h | 4 +- arch/powerpc/include/asm/kvm_host.h | 24 +- arch/powerpc/include/asm/ppc-opcode.h | 2 +- arch/powerpc/kernel/asm-offsets.c | 9 + arch/powerpc/kvm/Kconfig | 8 +- arch/powerpc/kvm/book3s.c | 3 +- arch/powerpc/kvm/book3s_32_mmu_host.c | 1 + arch/powerpc/kvm/book3s_64_mmu_host.c | 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +- arch/powerpc/kvm/book3s_emulate.c | 1 + arch/powerpc/kvm/book3s_hv.c | 660 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 32 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 161 +++- arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 128 +- arch/powerpc/kvm/book3s_paired_singles.c | 2 +- arch/powerpc/kvm/book3s_segment.S | 4 +- arch/powerpc/kvm/booke.c | 1 + arch/powerpc/kvm/e500_mmu.c | 2 +- arch/powerpc/kvm/powerpc.c| 2 +- 22 files changed, 938 insertions(+), 146 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 01/12] KVM: PPC: fix suspicious use of conditional operator
From: Tudor Laurentiu This was signaled by a static code analysis tool. Signed-off-by: Laurentiu Tudor Reviewed-by: Scott Wood Signed-off-by: Alexander Graf --- arch/powerpc/kvm/e500_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c index 50860e9..29911a0 100644 --- a/arch/powerpc/kvm/e500_mmu.c +++ b/arch/powerpc/kvm/e500_mmu.c @@ -377,7 +377,7 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea) | MAS0_NV(vcpu_e500->gtlb_nv[tlbsel]); vcpu->arch.shared->mas1 = (vcpu->arch.shared->mas6 & MAS6_SPID0) - | (vcpu->arch.shared->mas6 & (MAS6_SAS ? MAS1_TS : 0)) + | ((vcpu->arch.shared->mas6 & MAS6_SAS) ? MAS1_TS : 0) | (vcpu->arch.shared->mas4 & MAS4_TSIZED(~0)); vcpu->arch.shared->mas2 &= MAS2_EPN; vcpu->arch.shared->mas2 |= vcpu->arch.shared->mas4 & -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 05/12] KVM: PPC: Book3S HV: Make use of unused threads when running guests
From: Paul Mackerras When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu->arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Reviewed-by: David Gibson Tested-by: Laurent Vivier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 19 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kvm/book3s_hv.c| 333 ++-- arch/powerpc/kvm/book3s_hv_builtin.c| 7 +- arch/powerpc/kvm/book3s_hv_rm_xics.c| 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 + 6 files changed, 298 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d91f65b..2b74490 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -278,7 +278,9 @@ struct kvmppc_vcore { u16 last_cpu; u8 vcore_state; u8 in_guest; + struct kvmppc_vcore *master_vcore; struct list_head runnable_threads; + struct list_head preempt_list; spinlock_t lock; wait_queue_head_t wq; spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ @@ -300,12 +302,18 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) -/* Values for vcore_state */ +/* + * Values for vcore_state. + * Note that these are arranged such that lower values + * (< VCORE_SLEEPING) don't require stolen time accounting + * on load/unload, and higher values do. + */ #define VCORE_INACTIVE 0 -#define VCORE_SLEEPING 1 -#define VCORE_PREEMPT 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_PREEMPT 1 +#define VCORE_PIGGYBACK2 +#define VCORE_SLEEPING 3 +#define VCORE_RUNNING 4 +#define VCORE_EXITING 5 /* * Struct used to manage memory for a virtual processor area @@ -619,6 +627,7 @@ struct kvm_vcpu_arch { int trap; int state; int ptid; + int thread_cpu; bool timer_running; wait_queue_head_t cpu_run; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9823057..a78cdbf 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -512,6 +512,8 @@ int main(void) DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst)); + DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu)); + DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6e588ac..0173ce2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); #define MPP_BUFFER_ORDER 3 #endif +static int target_smt_mode; +module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)"); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu) static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int cpu = vcpu->cpu; + int cpu; wait_queue_head_t *wqp; wqp = kvm_arch_vcpu_
[PULL 08/12] KVM: PPC: Book3S HV: Fix bug in dirty page tracking
From: Paul Mackerras This fixes a bug in the tracking of pages that get modified by the guest. If the guest creates a large-page HPTE, writes to memory somewhere within the large page, and then removes the HPTE, we only record the modified state for the first normal page within the large page, when in fact the guest might have modified some other normal page within the large page. To fix this we use some unused bits in the rmap entry to record the order (log base 2) of the size of the page that was modified, when removing an HPTE. Then in kvm_test_clear_dirty_npages() we use that order to return the correct number of modified pages. The same thing could in principle happen when removing a HPTE at the host's request, i.e. when paging out a page, except that we never page out large pages, and the guest can only create large-page HPTEs if the guest RAM is backed by large pages. However, we also fix this case for the sake of future-proofing. The reference bit is also subject to the same loss of information. We don't make the same fix here for the reference bit because there isn't an interface for userspace to find out which pages the guest has referenced, whereas there is one for userspace to find out which pages the guest has modified. Because of this loss of information, the kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly say that a page has not been referenced when it has, but that doesn't matter greatly because we never page or swap out large pages. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 17 + 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..e6b2534 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -158,6 +158,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); +extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize); extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep, diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 80eb29a..e187b6a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -205,8 +205,10 @@ struct revmap_entry { */ #define KVMPPC_RMAP_LOCK_BIT 63 #define KVMPPC_RMAP_RC_SHIFT 32 +#define KVMPPC_RMAP_CHG_SHIFT 48 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R << KVMPPC_RMAP_RC_SHIFT) #define KVMPPC_RMAP_CHANGED(HPTE_R_C << KVMPPC_RMAP_RC_SHIFT) +#define KVMPPC_RMAP_CHG_ORDER (0x3ful << KVMPPC_RMAP_CHG_SHIFT) #define KVMPPC_RMAP_PRESENT0x1ul #define KVMPPC_RMAP_INDEX 0xul diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index dab68b7..1f9c0a1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -761,6 +761,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, /* Harvest R and C */ rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C); *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT; + if (rcbits & HPTE_R_C) + kvmppc_update_rmap_change(rmapp, psize); if (rcbits & ~rev[i].guest_rpte) { rev[i].guest_rpte = ptel | rcbits; note_hpte_modification(kvm, &rev[i]); @@ -927,8 +929,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) retry: lock_rmap(rmapp); if (*rmapp & KVMPPC_RMAP_CHANGED) { - *rmapp &= ~KVMPPC_RMAP_CHANGED; + long change_order = (*rmapp & KVMPPC_RMAP_CHG_ORDER) + >> KVMPPC_RMAP_CHG_SHIFT; + *rmapp &= ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER); npages_dirty = 1; + if (change_order > PAGE_SHIFT) + npages_dirty = 1ul << (change_order - PAGE_SHIFT); } if (!(*rmapp & KVMPPC_RMAP_PRESENT)) { unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c6d601c..c7a3ab2 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/k
[PULL 06/12] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
From: Paul Mackerras This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 367 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 473 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..57d5dfe 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_SMT_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_SMT_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-o
[PULL 03/12] KVM: PPC: Fix warnings from sparse
From: Thomas Huth When compiling the KVM code for POWER with "make C=1", sparse complains about functions missing proper prototypes and a 64-bit constant missing the ULL prefix. Let's fix this by making the functions static or by including the proper header with the prototypes, and by appending a ULL prefix to the constant PPC_MPPE_ADDRESS_MASK. Signed-off-by: Thomas Huth Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/ppc-opcode.h| 2 +- arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_32_mmu_host.c| 1 + arch/powerpc/kvm/book3s_64_mmu_host.c| 1 + arch/powerpc/kvm/book3s_emulate.c| 1 + arch/powerpc/kvm/book3s_hv.c | 8 arch/powerpc/kvm/book3s_paired_singles.c | 2 +- arch/powerpc/kvm/powerpc.c | 2 +- 8 files changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 8452335..790f5d1 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -287,7 +287,7 @@ /* POWER8 Micro Partition Prefetch (MPP) parameters */ /* Address mask is common for LOGMPP instruction and MPPR SPR */ -#define PPC_MPPE_ADDRESS_MASK 0xc000 +#define PPC_MPPE_ADDRESS_MASK 0xc000ULL /* Bits 60 and 61 of MPP SPR should be set to one of the following */ /* Aborting the fetch is indeed setting 00 in the table size bits */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 05ea8fc..53285d5 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -240,7 +240,8 @@ void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags) kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); } -int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) +static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, +unsigned int priority) { int deliver = 1; int vec = 0; diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 2035d16..d5c9bfe 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -26,6 +26,7 @@ #include #include #include +#include "book3s.h" /* #define DEBUG_MMU */ /* #define DEBUG_SR */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index b982d92..79ad35a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -28,6 +28,7 @@ #include #include #include "trace_pr.h" +#include "book3s.h" #define PTE_SIZE 12 diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 5a2bc4b..2afdb9c 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -23,6 +23,7 @@ #include #include #include +#include "book3s.h" #define OP_19_XOP_RFID 18 #define OP_19_XOP_RFI 50 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68d067a..6e588ac 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -214,12 +214,12 @@ static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) kvmppc_end_cede(vcpu); } -void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) { vcpu->arch.pvr = pvr; } -int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) +static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) { unsigned long pcr = 0; struct kvmppc_vcore *vc = vcpu->arch.vcore; @@ -259,7 +259,7 @@ int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) return 0; } -void kvmppc_dump_regs(struct kvm_vcpu *vcpu) +static void kvmppc_dump_regs(struct kvm_vcpu *vcpu) { int r; @@ -292,7 +292,7 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu) vcpu->arch.last_inst); } -struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id) +static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id) { int r; struct kvm_vcpu *v, *ret = NULL; diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c index bd6ab16..a759d9a 100644 --- a/arch/powerpc/kvm/book3s_paired_singles.c +++ b/arch/powerpc/kvm/book3s_paired_singles.c @@ -352,7 +352,7 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb) return kvmppc_get_field(inst, msb + 32, lsb + 32); } -bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst) +static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst) { if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE)) return false; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e5dd
[PULL 11/12] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
From: Paul Mackerras Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen time for it. This currently isn't the case when we have a vcore that no longer has any runnable threads in it but still has a runner task, so we do an explicit call to kvmppc_core_start_stolen() in that case. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3d02276..fad52f2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master) } list_del_init(&vc->preempt_list); if (!is_master) { - vc->vcore_state = vc->runner ? VCORE_PREEMPT : VCORE_INACTIVE; - if (still_running > 0) + if (still_running > 0) { kvmppc_vcore_preempt(vc); + } else if (vc->runner) { + vc->vcore_state = VCORE_PREEMPT; + kvmppc_core_start_stolen(vc); + } else { + vc->vcore_state = VCORE_INACTIVE; + } if (vc->n_runnable > 0 && vc->runner == NULL) { /* make sure there's a candidate runner awake */ vcpu = list_first_entry(&vc->runnable_threads, -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 10/12] KVM: PPC: Book3S HV: Fix preempted vcore list locking
From: Paul Mackerras When a vcore gets preempted, we put it on the preempted vcore list for the current CPU. The runner task then calls schedule() and comes back some time later and takes itself off the list. We need to be careful to lock the list that it was put onto, which may not be the list for the current CPU since the runner task may have moved to another CPU. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6e3ef30..3d02276 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore *vc) static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc) { - struct preempted_vcore_list *lp = this_cpu_ptr(&preempted_vcores); + struct preempted_vcore_list *lp; kvmppc_core_end_stolen(vc); if (!list_empty(&vc->preempt_list)) { + lp = &per_cpu(preempted_vcores, vc->pcpu); spin_lock(&lp->lock); list_del_init(&vc->preempt_list); spin_unlock(&lp->lock); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 12/12] KVM: PPC: Book3S: correct width in XER handling
From: Sam bobroff In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff Reviewed-by: Laurent Vivier Reviewed-by: Thomas Huth Tested-by: Thomas Huth Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 4 ++-- arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- arch/powerpc/include/asm/kvm_booke.h | 4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +++--- arch/powerpc/kvm/book3s_segment.S | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index e6b2534..9fac01c 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -226,12 +226,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu->arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu->arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu->arch.xer; } diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 57d5dfe..72b6225 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -132,7 +132,7 @@ struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; - u32 xer; + ulong xer; ulong ctr; ulong lr; ulong pc; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index 3286f0d..bc6e29e 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu->arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu->arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu->arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index e347766..472680f 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -944,7 +944,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1181,7 +1181,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1763,7 +1763,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 09/12] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
From: Paul Mackerras This adds implementations for the H_CLEAR_REF (test and clear reference bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls. When clearing the reference or change bit in the guest view of the HPTE, we also have to clear it in the real HPTE so that we can detect future references or changes. When we do so, we transfer the R or C bit value to the rmap entry for the underlying host page so that kvm_age_hva_hv(), kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page has been referenced and/or changed. These hypercalls are not used by Linux guests. These implementations have been tested using a FreeBSD guest. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +- 2 files changed, 121 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c7a3ab2..c1df9bb 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize) } EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change); +/* Returns a pointer to the revmap entry for the page mapped by a HPTE */ +static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v, + unsigned long hpte_gr) +{ + struct kvm_memory_slot *memslot; + unsigned long *rmap; + unsigned long gfn; + + gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr)); + memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); + if (!memslot) + return NULL; + + rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]); + return rmap; +} + /* Remove this HPTE from the chain for a real page */ static void remove_revmap_chain(struct kvm *kvm, long pte_index, struct revmap_entry *rev, unsigned long hpte_v, unsigned long hpte_r) { struct revmap_entry *next, *prev; - unsigned long gfn, ptel, head; - struct kvm_memory_slot *memslot; + unsigned long ptel, head; unsigned long *rmap; unsigned long rcbits; rcbits = hpte_r & (HPTE_R_R | HPTE_R_C); ptel = rev->guest_rpte |= rcbits; - gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel)); - memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); - if (!memslot) + rmap = revmap_for_hpte(kvm, hpte_v, ptel); + if (!rmap) return; - - rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]); lock_rmap(rmap); head = *rmap & KVMPPC_RMAP_INDEX; @@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, return H_SUCCESS; } +long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu->kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index >= kvm->arch.hpt_npte) + return H_PARAMETER; + + rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]); + hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4)); + while (!try_lock_hpte(hpte, HPTE_V_HVLOCK)) + cpu_relax(); + v = be64_to_cpu(hpte[0]); + r = be64_to_cpu(hpte[1]); + if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT))) + goto out; + + gr = rev->guest_rpte; + if (rev->guest_rpte & HPTE_R_R) { + rev->guest_rpte &= ~HPTE_R_R; + note_hpte_modification(kvm, rev); + } + if (v & HPTE_V_VALID) { + gr |= r & (HPTE_R_R | HPTE_R_C); + if (r & HPTE_R_R) { + kvmppc_clear_ref_hpte(kvm, hpte, pte_index); + rmap = revmap_for_hpte(kvm, v, gr); + if (rmap) { + lock_rmap(rmap); + *rmap |= KVMPPC_RMAP_REFERENCED; + unlock_rmap(rmap); + } + } + } + vcpu->arch.gpr[4] = gr; + ret = H_SUCCESS; + out: + unlock_hpte(hpte, v & ~HPTE_V_HVLOCK); + return ret; +} + +long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu->kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index >= kvm->arch.hpt_npte) + return H_
[PULL 07/12] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
From: Paul Mackerras The reference (R) and change (C) bits in a HPT entry can be set by hardware at any time up until the HPTE is invalidated and the TLB invalidation sequence has completed. This means that when removing a HPTE, we need to read the HPTE after the invalidation sequence has completed in order to obtain reliable values of R and C. The code in kvmppc_do_h_remove() used to do this. However, commit 6f22bd3265fb ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the read after invalidation as a side effect of other changes. This restores the read of the HPTE after invalidation. The user-visible effect of this bug would be that when migrating a guest, there is a small probability that a page modified by the guest and then unmapped by the guest might not get re-transmitted and thus the destination might end up with a stale copy of the page. Fixes: 6f22bd3265fb Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index b027a89..c6d601c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]); v = pte & ~HPTE_V_HVLOCK; if (v & HPTE_V_VALID) { - u64 pte1; - - pte1 = be64_to_cpu(hpte[1]); hpte[0] &= ~cpu_to_be64(HPTE_V_VALID); - rb = compute_tlbie_rb(v, pte1, pte_index); + rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index); do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true); - /* Read PTE low word after tlbie to get final R/C values */ - remove_revmap_chain(kvm, pte_index, rev, v, pte1); + /* +* The reference (R) and change (C) bits in a HPT +* entry can be set by hardware at any time up until +* the HPTE is invalidated and the TLB invalidation +* sequence has completed. This means that when +* removing a HPTE, we need to re-read the HPTE after +* the invalidation sequence has completed in order to +* obtain reliable values of R and C. +*/ + remove_revmap_chain(kvm, pte_index, rev, v, + be64_to_cpu(hpte[1])); } r = rev->guest_rpte & ~HPTE_GR_RESERVED; note_hpte_modification(kvm, rev); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix incorrect return statement in the function mpic_set_default_irq_routing
On 12.08.15 21:06, nick wrote: > > > On 2015-08-12 03:05 PM, Alexander Graf wrote: >> >> >> On 07.08.15 17:54, Nicholas Krause wrote: >>> This fixes the incorrect return statement in the function >>> mpic_set_default_irq_routing from always returning zero >>> to signal success to this function's caller to instead >>> return the return value of kvm_set_irq_routing as this >>> function can fail and we need to correctly signal the >>> caller of mpic_set_default_irq_routing when the call >>> to this particular function has failed. >>> >>> Signed-off-by: Nicholas Krause >> >> I like the patch, but I don't see it on the kvm-ppc mailing list. It >> doesn't show up on patchwork or spinics. Did something go wrong while >> sending it out? >> >> >> Alex >> > Alex, > Ask Paolo about it as he would be able to explain it better then I. Well, whatever the reason, I can only apply patches that actually appeared on the public mailing list. Otherwise people may not get the chance to review them ;). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c
On 10.08.15 17:27, Nicholas Krause wrote: > This fixes the wrapper functions kvm_umap_hva_hv and the function > kvm_unmap_hav_range_hv to return the return value of the function > kvm_handle_hva or kvm_handle_hva_range that they are wrapped to > call internally rather then always making the caller of these > wrapper functions think they always run successfully by returning > the value of zero directly. > > Signed-off-by: Nicholas Krause Paul, could you please take on this one? Thanks, Alex > --- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index dab68b7..0905c8f 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -774,14 +774,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned > long *rmapp, > > int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva) > { > - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); > - return 0; > + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); > } > > int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned > long end) > { > - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); > - return 0; > + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); > } > > void kvmppc_core_flush_memslot_hv(struct kvm *kvm, > -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix incorrect return statement in the function mpic_set_default_irq_routing
On 07.08.15 17:54, Nicholas Krause wrote: > This fixes the incorrect return statement in the function > mpic_set_default_irq_routing from always returning zero > to signal success to this function's caller to instead > return the return value of kvm_set_irq_routing as this > function can fail and we need to correctly signal the > caller of mpic_set_default_irq_routing when the call > to this particular function has failed. > > Signed-off-by: Nicholas Krause I like the patch, but I don't see it on the kvm-ppc mailing list. It doesn't show up on patchwork or spinics. Did something go wrong while sending it out? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling
On 06.08.15 12:16, Laurent Vivier wrote: > Hi, > > I'd also like to see this patch in the mainstream as it fixes a bug > appearing when we switch from vCPU context to hypervisor context (guest > crash). Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests PATCH 11/14] powerpc/ppc64: add rtas_power_off
On 03.08.15 19:02, Andrew Jones wrote: > On Mon, Aug 03, 2015 at 07:08:17PM +0200, Paolo Bonzini wrote: >> >> >> On 03/08/2015 16:41, Andrew Jones wrote: >>> Add enough RTAS support to support power-off, and apply it to >>> exit(). >>> >>> Signed-off-by: Andrew Jones >> >> Why not use virtio-mmio + testdev on ppc as well? Similar to how we're >> not using PSCI on ARM or ACPI on x86. > > I have some longer term plans to add minimal virtio-pci support to > kvm-unit-tests, and then we could plug virtio-serial+chr-testdev into > that. I didn't think I could use virtio-mmio directly with spapr, but > maybe I can? Actually, I sort of like this approach more in some You would need to add support for the dynamic sysbus device allocation in the spapr machine, but then I don't see why it wouldn't work. PCI however is the more natural choice on sPAPR if you want to do virtio. That said, if all you need is a chr transport, IIRC there should be a way to get you additional channels on the existing "serial port" - which really is just a simply hypercall interface. But David is the best person to guide you to the best path forward here. Alex > respects though, as it doesn't require a special testdev or virtio > support, keeping the unit test extra minimal. In fact, I was even > thinking about posting patches (which I've already written) that > allow chr-testdev to be optional for ARM too, now that it could use > the exitcode snooper. > > Thanks, > drew > >> >> Paolo >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Two fixes for dynamic micro-threading
On 20.07.15 08:49, David Gibson wrote: > On Thu, Jul 16, 2015 at 05:11:12PM +1000, Paul Mackerras wrote: >> This series contains two fixes for the new dynamic micro-threading >> code that was added recently for HV-mode KVM on Power servers. >> The patches are against Alex Graf's kvm-ppc-queue branch. Please >> apply. > > agraf, > > Any word on these? These appear to fix a really nasty host crash in > current upstream. I'd really like to see them merged ASAP. Thanks, applied to kvm-ppc-queue. The host crash should only occur with dynamic micro-threading enabled, which is not in Linus' tree, correct? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] PPC: Current patch queue for HV KVM
On 24.06.15 13:18, Paul Mackerras wrote: > This is my current queue of patches for HV KVM. This series is based > on the kvm next branch. They have all been posted 6 weeks ago or > more, though I have just added a 3-line fix to patch 2/5 to fix a bug > that we found in testing migration, and I expanded a comment (no code > change) in patch 3/5 following a suggestion by Aneesh. > > I'd like to see these go into 4.2 if possible. Thanks, applied all to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
On 06/24/15 13:18, Paul Mackerras wrote: This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc)((vc)->entry_exit_map >> 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/
Re: [PATCH 1/3] powerpc: implement barrier primitives
On 17.06.15 12:15, Will Deacon wrote: > On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: >> Instead of referring to the Linux header including the barrier >> macros, copy over the rather simple implementation for the PowerPC >> barrier instructions kvmtool uses. This fixes build for powerpc. >> >> Signed-off-by: Andre Przywara >> --- >> Hi, >> >> I just took what kvmtool seems to have used before, I actually have >> no idea if "sync" is the right instruction or "lwsync" would do. >> Would be nice if some people with PowerPC knowledge could comment. > > I *think* we can use lwsync for rmb and wmb, but would want confirmation > from a ppc guy before making that change! Also I'd prefer to play safe for now :) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] treewide: Fix typo compatability -> compatibility
On 27.05.15 14:05, Laurent Pinchart wrote: > Even though 'compatability' has a dedicated entry in the Wiktionary, > it's listed as 'Mispelling of compatibility'. Fix it. > > Signed-off-by: Laurent Pinchart > --- > arch/metag/include/asm/elf.h | 2 +- > arch/powerpc/kvm/book3s.c | 2 +- Acked-by: Alexander Graf for the PPC KVM bit. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] KVM: PPC: Book3S: correct width in XER handling
On 26.05.15 02:27, Sam Bobroff wrote: > In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 > bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is > accessed as such. > > This patch corrects places where it is accessed as a 32 bit field by a > 64 bit kernel. In some cases this is via a 32 bit load or store > instruction which, depending on endianness, will cause either the > lower or upper 32 bits to be missed. In another case it is cast as a > u32, causing the upper 32 bits to be cleared. > > This patch corrects those places by extending the access methods to > 64 bits. > > Signed-off-by: Sam Bobroff > --- > > v2: > > Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. > > arch/powerpc/include/asm/kvm_book3s.h |4 ++-- > arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- > arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- > arch/powerpc/kvm/book3s_segment.S |4 ++-- > 4 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index b91e74a..05a875a 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) > return vcpu->arch.cr; > } > > -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) > +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) Now we have book3s and booke files with different prototypes on the same inline function names. That's really ugly. Please keep them in sync ;). Alex > { > vcpu->arch.xer = val; > } > > -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) > +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) > { > return vcpu->arch.xer; > } > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h > b/arch/powerpc/include/asm/kvm_book3s_asm.h > index 5bdfb5d..c4ccd2d 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h > @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { > bool in_use; > ulong gpr[14]; > u32 cr; > - u32 xer; > + ulong xer; > ulong ctr; > ulong lr; > ulong pc; > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index 4d70df2..d75be59 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > blt hdec_soon > > ld r6, VCPU_CTR(r4) > - lwz r7, VCPU_XER(r4) > + ld r7, VCPU_XER(r4) > > mtctr r6 > mtxer r7 > @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > mfctr r3 > mfxer r4 > std r3, VCPU_CTR(r9) > - stw r4, VCPU_XER(r9) > + std r4, VCPU_XER(r9) > > /* If this is a page table miss then see if it's theirs or ours */ > cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE > @@ -1675,7 +1675,7 @@ kvmppc_hdsi: > bl kvmppc_msr_interrupt > fast_interrupt_c_return: > 6: ld r7, VCPU_CTR(r9) > - lwz r8, VCPU_XER(r9) > + ld r8, VCPU_XER(r9) > mtctr r7 > mtxer r8 > mr r4, r9 > diff --git a/arch/powerpc/kvm/book3s_segment.S > b/arch/powerpc/kvm/book3s_segment.S > index acee37c..ca8f174 100644 > --- a/arch/powerpc/kvm/book3s_segment.S > +++ b/arch/powerpc/kvm/book3s_segment.S > @@ -123,7 +123,7 @@ no_dcbz32_on: > PPC_LL r8, SVCPU_CTR(r3) > PPC_LL r9, SVCPU_LR(r3) > lwz r10, SVCPU_CR(r3) > - lwz r11, SVCPU_XER(r3) > + PPC_LL r11, SVCPU_XER(r3) > > mtctr r8 > mtlrr9 > @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) > mfctr r8 > mflrr9 > > - stw r5, SVCPU_XER(r13) > + PPC_STL r5, SVCPU_XER(r13) > PPC_STL r6, SVCPU_FAULT_DAR(r13) > stw r7, SVCPU_FAULT_DSISR(r13) > PPC_STL r8, SVCPU_CTR(r13) > -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
On 26.05.15 02:14, Sam Bobroff wrote: > On Mon, May 25, 2015 at 11:08:08PM +0200, Alexander Graf wrote: >> >> >> On 20.05.15 07:26, Sam Bobroff wrote: >>> In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 >>> bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is >>> accessed as such. >>> >>> This patch corrects places where it is accessed as a 32 bit field by a >>> 64 bit kernel. In some cases this is via a 32 bit load or store >>> instruction which, depending on endianness, will cause either the >>> lower or upper 32 bits to be missed. In another case it is cast as a >>> u32, causing the upper 32 bits to be cleared. >>> >>> This patch corrects those places by extending the access methods to >>> 64 bits. >>> >>> Signed-off-by: Sam Bobroff >>> --- >>> >>> arch/powerpc/include/asm/kvm_book3s.h |4 ++-- >>> arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- >>> arch/powerpc/kvm/book3s_segment.S |4 ++-- >>> 3 files changed, 7 insertions(+), 7 deletions(-) >>> >>> diff --git a/arch/powerpc/include/asm/kvm_book3s.h >>> b/arch/powerpc/include/asm/kvm_book3s.h >>> index b91e74a..05a875a 100644 >>> --- a/arch/powerpc/include/asm/kvm_book3s.h >>> +++ b/arch/powerpc/include/asm/kvm_book3s.h >>> @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) >>> return vcpu->arch.cr; >>> } >>> >>> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) >>> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) >>> { >>> vcpu->arch.xer = val; >>> } >>> >>> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) >>> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) >>> { >>> return vcpu->arch.xer; >>> } >>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S >>> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S >>> index 4d70df2..d75be59 100644 >>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S >>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S >>> @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) >>> blt hdec_soon >>> >>> ld r6, VCPU_CTR(r4) >>> - lwz r7, VCPU_XER(r4) >>> + ld r7, VCPU_XER(r4) >>> >>> mtctr r6 >>> mtxer r7 >>> @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) >>> mfctr r3 >>> mfxer r4 >>> std r3, VCPU_CTR(r9) >>> - stw r4, VCPU_XER(r9) >>> + std r4, VCPU_XER(r9) >>> >>> /* If this is a page table miss then see if it's theirs or ours */ >>> cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE >>> @@ -1675,7 +1675,7 @@ kvmppc_hdsi: >>> bl kvmppc_msr_interrupt >>> fast_interrupt_c_return: >>> 6: ld r7, VCPU_CTR(r9) >>> - lwz r8, VCPU_XER(r9) >>> + ld r8, VCPU_XER(r9) >>> mtctr r7 >>> mtxer r8 >>> mr r4, r9 >>> diff --git a/arch/powerpc/kvm/book3s_segment.S >>> b/arch/powerpc/kvm/book3s_segment.S >>> index acee37c..ca8f174 100644 >>> --- a/arch/powerpc/kvm/book3s_segment.S >>> +++ b/arch/powerpc/kvm/book3s_segment.S >>> @@ -123,7 +123,7 @@ no_dcbz32_on: >>> PPC_LL r8, SVCPU_CTR(r3) >>> PPC_LL r9, SVCPU_LR(r3) >>> lwz r10, SVCPU_CR(r3) >>> - lwz r11, SVCPU_XER(r3) >>> + PPC_LL r11, SVCPU_XER(r3) >> >> struct kvmppc_book3s_shadow_vcpu { >> bool in_use; >> ulong gpr[14]; >> u32 cr; >> u32 xer; >> [...] >> >> so at least this change looks wrong. Please double-check all fields in >> your patch again. >> >> >> Alex > > Thanks for the review and the catch! > > The xer field in kvm_vcpu_arch is already ulong, so it looks like the one in > kvmppc_book3s_shadow_vcpu is the only other case. I'll fix that and repost. I guess given that the one in pt_regs is also ulong going ulong rather than u32 is the better choice, yes. While at it, could you please just do a grep -i xer across all kvm (.c and .h) files and just sanity check that we're staying in sync? Thanks! Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: add missing pt_regs initialization
On 18.05.15 14:44, Laurentiu Tudor wrote: > On this switch branch the regs initialization > doesn't happen so add it. > This was found with the help of a static > code analysis tool. > > Signed-off-by: Laurentiu Tudor > Cc: Scott Wood > Cc: Mihai Caraman Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
On 20.05.15 07:26, Sam Bobroff wrote: > In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 > bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is > accessed as such. > > This patch corrects places where it is accessed as a 32 bit field by a > 64 bit kernel. In some cases this is via a 32 bit load or store > instruction which, depending on endianness, will cause either the > lower or upper 32 bits to be missed. In another case it is cast as a > u32, causing the upper 32 bits to be cleared. > > This patch corrects those places by extending the access methods to > 64 bits. > > Signed-off-by: Sam Bobroff > --- > > arch/powerpc/include/asm/kvm_book3s.h |4 ++-- > arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- > arch/powerpc/kvm/book3s_segment.S |4 ++-- > 3 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index b91e74a..05a875a 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) > return vcpu->arch.cr; > } > > -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) > +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) > { > vcpu->arch.xer = val; > } > > -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) > +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) > { > return vcpu->arch.xer; > } > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index 4d70df2..d75be59 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > blt hdec_soon > > ld r6, VCPU_CTR(r4) > - lwz r7, VCPU_XER(r4) > + ld r7, VCPU_XER(r4) > > mtctr r6 > mtxer r7 > @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > mfctr r3 > mfxer r4 > std r3, VCPU_CTR(r9) > - stw r4, VCPU_XER(r9) > + std r4, VCPU_XER(r9) > > /* If this is a page table miss then see if it's theirs or ours */ > cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE > @@ -1675,7 +1675,7 @@ kvmppc_hdsi: > bl kvmppc_msr_interrupt > fast_interrupt_c_return: > 6: ld r7, VCPU_CTR(r9) > - lwz r8, VCPU_XER(r9) > + ld r8, VCPU_XER(r9) > mtctr r7 > mtxer r8 > mr r4, r9 > diff --git a/arch/powerpc/kvm/book3s_segment.S > b/arch/powerpc/kvm/book3s_segment.S > index acee37c..ca8f174 100644 > --- a/arch/powerpc/kvm/book3s_segment.S > +++ b/arch/powerpc/kvm/book3s_segment.S > @@ -123,7 +123,7 @@ no_dcbz32_on: > PPC_LL r8, SVCPU_CTR(r3) > PPC_LL r9, SVCPU_LR(r3) > lwz r10, SVCPU_CR(r3) > - lwz r11, SVCPU_XER(r3) > + PPC_LL r11, SVCPU_XER(r3) struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; u32 xer; [...] so at least this change looks wrong. Please double-check all fields in your patch again. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
On 21.05.15 21:37, Scott Wood wrote: > On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote: >> If passed a larger page size lookup_linux_ptep() >> may fail, so add a check for that and bail out >> if that's the case. >> This was found with the help of a static >> code analysis tool. >> >> Signed-off-by: Mihai Caraman >> Signed-off-by: Laurentiu Tudor >> Cc: Scott Wood >> --- >> based on https://github.com/agraf/linux-2.6.git kvm-ppc-next >> >> arch/powerpc/kvm/e500_mmu_host.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > Reviewed-by: Scott Wood Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Fix warnings from sparse
On 22.05.15 09:25, Thomas Huth wrote: > When compiling the KVM code for POWER with "make C=1", sparse > complains about functions missing proper prototypes and a 64-bit > constant missing the ULL prefix. Let's fix this by making the > functions static or by including the proper header with the > prototypes, and by appending a ULL prefix to the constant > PPC_MPPE_ADDRESS_MASK. > > Signed-off-by: Thomas Huth Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
On 22.05.15 11:41, Thomas Huth wrote: > Since the PPC970 support has been removed from the kvm-hv kernel > module recently, we should also reflect this change in the help > text of the corresponding Kconfig option. > > Signed-off-by: Thomas Huth Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
On 22.05.15 11:41, Thomas Huth wrote: > Since the PPC970 support has been removed from the kvm-hv kernel > module recently, we should also reflect this change in the help > text of the corresponding Kconfig option. > > Signed-off-by: Thomas Huth Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: fix suspicious use of conditional operator
On 25.05.15 10:48, Laurentiu Tudor wrote: > This was signaled by a static code analysis tool. > > Signed-off-by: Laurentiu Tudor > Reviewed-by: Scott Wood Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 15/21] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
From: Paul Mackerras We can tell when a secondary thread has finished running a guest by the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there is no real need for the nap_count field in the kvmppc_vcore struct. This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu pointers of the secondary threads rather than polling vc->nap_count. Besides reducing the size of the kvmppc_vcore struct by 8 bytes, this also means that we can tell which secondary threads have got stuck and thus print a more informative error message. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 1 - arch/powerpc/kvm/book3s_hv.c| 47 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 + 4 files changed, 34 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 83c4425..1517faa 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -272,8 +272,6 @@ struct kvmppc_vcore { int n_runnable; int num_threads; int entry_exit_count; - int n_woken; - int nap_count; int napping_threads; int first_vcpuid; u16 pcpu; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 92ec3fc..8aa8246 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -563,7 +563,6 @@ int main(void) DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort)); DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1)); DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count)); - DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count)); DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest)); DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads)); DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index fb4f166..7c1335d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1729,8 +1729,10 @@ static int kvmppc_grab_hwthread(int cpu) tpaca = &paca[cpu]; /* Ensure the thread won't go into the kernel if it wakes */ - tpaca->kvm_hstate.hwthread_req = 1; tpaca->kvm_hstate.kvm_vcpu = NULL; + tpaca->kvm_hstate.napping = 0; + smp_wmb(); + tpaca->kvm_hstate.hwthread_req = 1; /* * If the thread is already executing in the kernel (e.g. handling @@ -1773,35 +1775,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) } cpu = vc->pcpu + vcpu->arch.ptid; tpaca = &paca[cpu]; - tpaca->kvm_hstate.kvm_vcpu = vcpu; tpaca->kvm_hstate.kvm_vcore = vc; tpaca->kvm_hstate.ptid = vcpu->arch.ptid; vcpu->cpu = vc->pcpu; + /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */ smp_wmb(); + tpaca->kvm_hstate.kvm_vcpu = vcpu; #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) - if (cpu != smp_processor_id()) { + if (cpu != smp_processor_id()) xics_wake_cpu(cpu); - if (vcpu->arch.ptid) - ++vc->n_woken; - } #endif } -static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc) +static void kvmppc_wait_for_nap(void) { - int i; + int cpu = smp_processor_id(); + int i, loops; - HMT_low(); - i = 0; - while (vc->nap_count < vc->n_woken) { - if (++i >= 100) { - pr_err("kvmppc_wait_for_nap timeout %d %d\n", - vc->nap_count, vc->n_woken); - break; + for (loops = 0; loops < 100; ++loops) { + /* +* Check if all threads are finished. +* We set the vcpu pointer when starting a thread +* and the thread clears it when finished, so we look +* for any threads that still have a non-NULL vcpu ptr. +*/ + for (i = 1; i < threads_per_subcore; ++i) + if (paca[cpu + i].kvm_hstate.kvm_vcpu) + break; + if (i == threads_per_subcore) { + HMT_medium(); + return; } - cpu_relax(); + HMT_low(); } HMT_medium(); + for (i = 1; i < threads_per_subcore; ++i) + if (paca[cpu + i].kvm_hstate.kvm_vcpu) + pr_err("KVM: CPU %d seems to be stuck\n", cpu + i); } /* @@ -1942,8 +1952,6 @@ static void kvmppc_run
[PULL 11/21] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
From: Paul Mackerras This reads the timebase at various points in the real-mode guest entry/exit code and uses that to accumulate total, minimum and maximum time spent in those parts of the code. Currently these times are accumulated per vcpu in 5 parts of the code: * rm_entry - time taken from the start of kvmppc_hv_entry() until just before entering the guest. * rm_intr - time from when we take a hypervisor interrupt in the guest until we either re-enter the guest or decide to exit to the host. This includes time spent handling hcalls in real mode. * rm_exit - time from when we decide to exit the guest until the return from kvmppc_hv_entry(). * guest - time spend in the guest * cede - time spent napping in real mode due to an H_CEDE hcall while other threads in the same vcore are active. These times are exposed in debugfs in a directory per vcpu that contains a file called "timings". This file contains one line for each of the 5 timings above, with the name followed by a colon and 4 numbers, which are the count (number of times the code has been executed), the total time, the minimum time, and the maximum time, all in nanoseconds. The overhead of the extra code amounts to about 30ns for an hcall that is handled in real mode (e.g. H_SET_DABR), which is about 25%. Since production environments may not wish to incur this overhead, the new code is conditional on a new config symbol, CONFIG_KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 21 + arch/powerpc/include/asm/time.h | 3 + arch/powerpc/kernel/asm-offsets.c | 13 +++ arch/powerpc/kernel/time.c | 6 ++ arch/powerpc/kvm/Kconfig| 14 +++ arch/powerpc/kvm/book3s_hv.c| 150 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 141 +- 7 files changed, 346 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index f1d0bbc..d2068bb 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -369,6 +369,14 @@ struct kvmppc_slb { u8 base_page_size; /* MMU_PAGE_xxx */ }; +/* Struct used to accumulate timing information in HV real mode code */ +struct kvmhv_tb_accumulator { + u64 seqcount; /* used to synchronize access, also count * 2 */ + u64 tb_total; /* total time in timebase ticks */ + u64 tb_min; /* min time */ + u64 tb_max; /* max time */ +}; + # ifdef CONFIG_PPC_FSL_BOOK3E #define KVMPPC_BOOKE_IAC_NUM 2 #define KVMPPC_BOOKE_DAC_NUM 2 @@ -657,6 +665,19 @@ struct kvm_vcpu_arch { u32 emul_inst; #endif + +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING + struct kvmhv_tb_accumulator *cur_activity; /* What we're timing */ + u64 cur_tb_start; /* when it started */ + struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ + struct kvmhv_tb_accumulator guest_time; /* guest execution */ + struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ + + struct dentry *debugfs_dir; + struct dentry *debugfs_timings; +#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */ }; #define VCPU_FPR(vcpu, i) (vcpu)->arch.fp.fpr[i][TS_FPROFFSET] diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 03cbada..10fc784 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void); DECLARE_PER_CPU(u64, decrementers_next_tb); +/* Convert timebase ticks to nanoseconds */ +unsigned long long tb_to_ns(unsigned long long tb_ticks); + #endif /* __KERNEL__ */ #endif /* __POWERPC_TIME_H */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 4717859..3fea721 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -459,6 +459,19 @@ int main(void) DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2)); DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3)); #endif +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry)); + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr)); + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit)); + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time)); + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time)); + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity)); + DEFINE(VCPU_ACTIVITY_START, offset
[PULL 14/21] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
From: Paul Mackerras Rather than calling cond_resched() in kvmppc_run_core() before doing the post-processing for the vcpus that we have just run (that is, calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do that post-processing before calling cond_resched(), and that post- processing is moved out into its own function, post_guest_process(). The reschedule point is now in kvmppc_run_vcpu() and we define a new vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner task is runnable but not running. (Doing the reschedule with the vcore in VCORE_INACTIVE state would be bad because there are potentially other vcpus waiting for the runner in kvmppc_wait_for_exec() which then wouldn't get woken up.) Also, we make use of the handy cond_resched_lock() function, which unlocks and relocks vc->lock for us around the reschedule. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 5 +- arch/powerpc/kvm/book3s_hv.c| 92 + 2 files changed, 55 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3eecd88..83c4425 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -304,8 +304,9 @@ struct kvmppc_vcore { /* Values for vcore_state */ #define VCORE_INACTIVE 0 #define VCORE_SLEEPING 1 -#define VCORE_RUNNING 2 -#define VCORE_EXITING 3 +#define VCORE_PREEMPT 2 +#define VCORE_RUNNING 3 +#define VCORE_EXITING 4 /* * Struct used to manage memory for a virtual processor area diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b38c10e..fb4f166 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1882,15 +1882,50 @@ static void prepare_threads(struct kvmppc_vcore *vc) } } +static void post_guest_process(struct kvmppc_vcore *vc) +{ + u64 now; + long ret; + struct kvm_vcpu *vcpu, *vnext; + + now = get_tb(); + list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, +arch.run_list) { + /* cancel pending dec exception if dec is positive */ + if (now < vcpu->arch.dec_expires && + kvmppc_core_pending_dec(vcpu)) + kvmppc_core_dequeue_dec(vcpu); + + trace_kvm_guest_exit(vcpu); + + ret = RESUME_GUEST; + if (vcpu->arch.trap) + ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu, + vcpu->arch.run_task); + + vcpu->arch.ret = ret; + vcpu->arch.trap = 0; + + if (vcpu->arch.ceded) { + if (!is_kvmppc_resume_guest(ret)) + kvmppc_end_cede(vcpu); + else + kvmppc_set_timer(vcpu); + } + if (!is_kvmppc_resume_guest(vcpu->arch.ret)) { + kvmppc_remove_runnable(vc, vcpu); + wake_up(&vcpu->arch.cpu_run); + } + } +} + /* * Run a set of guest threads on a physical core. * Called with vc->lock held. */ static void kvmppc_run_core(struct kvmppc_vcore *vc) { - struct kvm_vcpu *vcpu, *vnext; - long ret; - u64 now; + struct kvm_vcpu *vcpu; int i; int srcu_idx; @@ -1922,8 +1957,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) */ if ((threads_per_core > 1) && ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) { - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) + list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { vcpu->arch.ret = -EBUSY; + kvmppc_remove_runnable(vc, vcpu); + wake_up(&vcpu->arch.cpu_run); + } goto out; } @@ -1979,44 +2017,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) kvm_guest_exit(); preempt_enable(); - cond_resched(); spin_lock(&vc->lock); - now = get_tb(); - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { - /* cancel pending dec exception if dec is positive */ - if (now < vcpu->arch.dec_expires && - kvmppc_core_pending_dec(vcpu)) - kvmppc_core_dequeue_dec(vcpu); - - trace_kvm_guest_exit(vcpu); - - ret = RESUME_GUEST; - if (vcpu->arch.trap) - ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu, -
[PULL 07/21] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
From: Suresh Warrier Replaces the ICS mutex lock with a spin lock since we will be porting these routines to real mode. Note that we need to disable interrupts before we take the lock in anticipation of the fact that on the guest side, we are running in the context of a hard irq and interrupts are disabled (EE bit off) when the lock is acquired. Again, because we will be acquiring the lock in hypervisor real mode, we need to use an arch_spinlock_t instead of a normal spinlock here as we want to avoid running any lockdep code (which may not be safe to execute in real mode). Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_xics.c | 68 +- arch/powerpc/kvm/book3s_xics.h | 2 +- 2 files changed, 48 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 60bdbac..5f7beebd 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -39,7 +40,7 @@ * LOCKING * === * - * Each ICS has a mutex protecting the information about the IRQ + * Each ICS has a spin lock protecting the information about the IRQ * sources and avoiding simultaneous deliveries if the same interrupt. * * ICP operations are done via a single compare & swap transaction @@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, { int i; - mutex_lock(&ics->lock); + unsigned long flags; + + local_irq_save(flags); + arch_spin_lock(&ics->lock); for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { struct ics_irq_state *state = &ics->irq_state[i]; @@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, XICS_DBG("resend %#x prio %#x\n", state->number, state->priority); - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); icp_deliver_irq(xics, icp, state->number); - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); } - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); } static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, @@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, u32 server, u32 priority, u32 saved_priority) { bool deliver; + unsigned long flags; - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); state->server = server; state->priority = priority; @@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, deliver = true; } - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); return deliver; } @@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority) struct kvmppc_ics *ics; struct ics_irq_state *state; u16 src; + unsigned long flags; if (!xics) return -ENODEV; @@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority) return -EINVAL; state = &ics->irq_state[src]; - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); *server = state->server; *priority = state->priority; - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); return 0; } @@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, struct kvmppc_ics *ics; u32 reject; u16 src; + unsigned long flags; /* * This is used both for initial delivery of an interrupt and @@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, state = &ics->irq_state[src]; /* Get a lock on the ICS */ - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); /* Get our server */ if (!icp || state->server != icp->server_num) { @@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * * Note that if successful, the new delivery might have itself * rejected an interrupt th
[PULL 09/21] KVM: PPC: Book3S HV: Add ICP real mode counters
From: Suresh Warrier Add two counters to count how often we generate real-mode ICS resend and reject events. The counters provide some performance statistics that could be used in the future to consider if the real mode functions need further optimizing. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Also added two counters that count (approximately) how many times we don't find an ICP or ICS we're looking for. These are not currently exposed through sysfs, but can be useful when debugging crashes. Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 7 +++ arch/powerpc/kvm/book3s_xics.c | 10 -- arch/powerpc/kvm/book3s_xics.h | 5 + 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 73bbe92..6dded8c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, ics = kvmppc_xics_find_ics(xics, new_irq, &src); if (!ics) { /* Unsafe increment, but this does not need to be accurate */ + xics->err_noics++; return; } state = &ics->irq_state[src]; @@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, icp = kvmppc_xics_find_server(xics->kvm, state->server); if (!icp) { /* Unsafe increment again*/ + xics->err_noicp++; goto out; } } @@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * separately here as well. */ if (resend) { + icp->n_check_resend++; icp_rm_check_resend(xics, icp); } } @@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server, /* Handle reject in real mode */ if (reject && reject != XICS_IPI) { + this_icp->n_reject++; icp_rm_deliver_irq(xics, icp, reject); } /* Handle resends in real mode */ if (resend) { + this_icp->n_check_resend++; icp_rm_check_resend(xics, icp); } @@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr) * attempt (see comments in icp_rm_deliver_irq). */ if (reject && reject != XICS_IPI) { + icp->n_reject++; icp_rm_deliver_irq(xics, icp, reject); } bail: @@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr) /* Still asserted, resend it */ if (state->asserted) { + icp->n_reject++; icp_rm_deliver_irq(xics, icp, irq); } diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 5f7beebd..8f3e6cc 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void *private) unsigned long flags; unsigned long t_rm_kick_vcpu, t_rm_check_resend; unsigned long t_rm_reject, t_rm_notify_eoi; + unsigned long t_reject, t_check_resend; if (!kvm) return 0; @@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void *private) t_rm_notify_eoi = 0; t_rm_check_resend = 0; t_rm_reject = 0; + t_check_resend = 0; + t_reject = 0; seq_printf(m, "=\nICP state\n=\n"); @@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void *private) t_rm_notify_eoi += icp->n_rm_notify_eoi; t_rm_check_resend += icp->n_rm_check_resend; t_rm_reject += icp->n_rm_reject; + t_check_resend += icp->n_check_resend; + t_reject += icp->n_reject; } - seq_puts(m, "ICP Guest Real Mode exit totals: "); - seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n", + seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n", t_rm_kick_vcpu, t_rm_check_resend, t_rm_reject, t_rm_notify_eoi); + seq_printf(m, "ICP Real Mode totals: check_resend=%lu resend=%lu\n", + t_check_resend, t_reject); for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) { struct kvmppc_ics *ics = xics->ics[icsid]; diff --g
[PULL 06/21] KVM: PPC: Book3S HV: Add guest->host real mode completion counters
From: "Suresh E. Warrier" Add counters to track number of times we switch from guest real mode to host virtual mode during an interrupt-related hyper call because the hypercall requires actions that cannot be completed in real mode. This will help when making optimizations that reduce guest-host transitions. It is safe to use an ordinary increment rather than an atomic operation because there is one ICP per virtual CPU and kvmppc_xics_rm_complete() only works on the ICP for the current VCPU. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_xics.c | 31 +++ arch/powerpc/kvm/book3s_xics.h | 6 ++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index a4a8d9f..60bdbac 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n", hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt); - if (icp->rm_action & XICS_RM_KICK_VCPU) + if (icp->rm_action & XICS_RM_KICK_VCPU) { + icp->n_rm_kick_vcpu++; kvmppc_fast_vcpu_kick(icp->rm_kick_target); - if (icp->rm_action & XICS_RM_CHECK_RESEND) + } + if (icp->rm_action & XICS_RM_CHECK_RESEND) { + icp->n_rm_check_resend++; icp_check_resend(xics, icp->rm_resend_icp); - if (icp->rm_action & XICS_RM_REJECT) + } + if (icp->rm_action & XICS_RM_REJECT) { + icp->n_rm_reject++; icp_deliver_irq(xics, icp, icp->rm_reject); - if (icp->rm_action & XICS_RM_NOTIFY_EOI) + } + if (icp->rm_action & XICS_RM_NOTIFY_EOI) { + icp->n_rm_notify_eoi++; kvm_notify_acked_irq(vcpu->kvm, 0, icp->rm_eoied_irq); + } icp->rm_action = 0; @@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void *private) struct kvm *kvm = xics->kvm; struct kvm_vcpu *vcpu; int icsid, i; + unsigned long t_rm_kick_vcpu, t_rm_check_resend; + unsigned long t_rm_reject, t_rm_notify_eoi; if (!kvm) return 0; + t_rm_kick_vcpu = 0; + t_rm_notify_eoi = 0; + t_rm_check_resend = 0; + t_rm_reject = 0; + seq_printf(m, "=\nICP state\n=\n"); kvm_for_each_vcpu(i, vcpu, kvm) { @@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void *private) icp->server_num, state.xisr, state.pending_pri, state.cppr, state.mfrr, state.out_ee, state.need_resend); + t_rm_kick_vcpu += icp->n_rm_kick_vcpu; + t_rm_notify_eoi += icp->n_rm_notify_eoi; + t_rm_check_resend += icp->n_rm_check_resend; + t_rm_reject += icp->n_rm_reject; } + seq_puts(m, "ICP Guest Real Mode exit totals: "); + seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n", + t_rm_kick_vcpu, t_rm_check_resend, + t_rm_reject, t_rm_notify_eoi); for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) { struct kvmppc_ics *ics = xics->ics[icsid]; diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h index 73f0f27..de970ec 100644 --- a/arch/powerpc/kvm/book3s_xics.h +++ b/arch/powerpc/kvm/book3s_xics.h @@ -78,6 +78,12 @@ struct kvmppc_icp { u32 rm_reject; u32 rm_eoied_irq; + /* Counters for each reason we exited real mode */ + unsigned long n_rm_kick_vcpu; + unsigned long n_rm_check_resend; + unsigned long n_rm_reject; + unsigned long n_rm_notify_eoi; + /* Debug stuff for real mode */ union kvmppc_icp_state rm_dbgstate; struct kvm_vcpu *rm_dbgtgt; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 12/21] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
From: Paul Mackerras Previously, if kvmppc_run_core() was running a VCPU that needed a VPA update (i.e. one of its 3 virtual processor areas needed to be pinned in memory so the host real mode code can update it on guest entry and exit), we would drop the vcore lock and do the update there and then. Future changes will make it inconvenient to drop the lock, so instead we now remove it from the list of runnable VCPUs and wake up its VCPU task. This will have the effect that the VCPU task will exit kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call to kvmppc_update_vpas() and then rejoin the vcore. The one complication is that the runner VCPU (whose VCPU task is the current task) might be one of the ones that gets removed from the runnable list. In that case we just return from kvmppc_run_core() and let the code in kvmppc_run_vcpu() wake up another VCPU task to be the runner if necessary. This all means that the VCORE_STARTING state is no longer used, so we remove it. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 5 ++-- arch/powerpc/kvm/book3s_hv.c| 56 - 2 files changed, 32 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d2068bb..2f339ff 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -306,9 +306,8 @@ struct kvmppc_vcore { /* Values for vcore_state */ #define VCORE_INACTIVE 0 #define VCORE_SLEEPING 1 -#define VCORE_STARTING 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_RUNNING 2 +#define VCORE_EXITING 3 /* * Struct used to manage memory for a virtual processor area diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 64a02d4..b38c10e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1863,6 +1863,25 @@ static void kvmppc_start_restoring_l2_cache(const struct kvmppc_vcore *vc) mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE); } +static void prepare_threads(struct kvmppc_vcore *vc) +{ + struct kvm_vcpu *vcpu, *vnext; + + list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, +arch.run_list) { + if (signal_pending(vcpu->arch.run_task)) + vcpu->arch.ret = -EINTR; + else if (vcpu->arch.vpa.update_pending || +vcpu->arch.slb_shadow.update_pending || +vcpu->arch.dtl.update_pending) + vcpu->arch.ret = RESUME_GUEST; + else + continue; + kvmppc_remove_runnable(vc, vcpu); + wake_up(&vcpu->arch.cpu_run); + } +} + /* * Run a set of guest threads on a physical core. * Called with vc->lock held. @@ -1872,46 +1891,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) struct kvm_vcpu *vcpu, *vnext; long ret; u64 now; - int i, need_vpa_update; + int i; int srcu_idx; - struct kvm_vcpu *vcpus_to_update[threads_per_core]; - /* don't start if any threads have a signal pending */ - need_vpa_update = 0; - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { - if (signal_pending(vcpu->arch.run_task)) - return; - if (vcpu->arch.vpa.update_pending || - vcpu->arch.slb_shadow.update_pending || - vcpu->arch.dtl.update_pending) - vcpus_to_update[need_vpa_update++] = vcpu; - } + /* +* Remove from the list any threads that have a signal pending +* or need a VPA update done +*/ + prepare_threads(vc); + + /* if the runner is no longer runnable, let the caller pick a new one */ + if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE) + return; /* -* Initialize *vc, in particular vc->vcore_state, so we can -* drop the vcore lock if necessary. +* Initialize *vc. */ vc->n_woken = 0; vc->nap_count = 0; vc->entry_exit_count = 0; vc->preempt_tb = TB_NIL; - vc->vcore_state = VCORE_STARTING; vc->in_guest = 0; vc->napping_threads = 0; vc->conferring_threads = 0; /* -* Updating any of the vpas requires calling kvmppc_pin_guest_page, -* which can't be called with any spinlocks held. -*/ - if (need_vpa_update) { - spin_unlock(&vc->lock); - for (i = 0; i < need_vpa_update; ++i) - kvmppc_update_vpas(vcpus_to_update[i]); - sp
[PULL 13/21] KVM: PPC: Book3S HV: Minor cleanups
From: Paul Mackerras * Remove unused kvmppc_vcore::n_busy field. * Remove setting of RMOR, since it was only used on PPC970 and the PPC970 KVM support has been removed. * Don't use r1 or r2 in setting the runlatch since they are conventionally reserved for other things; use r0 instead. * Streamline the code a little and remove the ext_interrupt_to_host label. * Add some comments about register usage. * hcall_try_real_mode doesn't need to be global, and can't be called from C code anyway. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 1 - arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++--- 3 files changed, 19 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2f339ff..3eecd88 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -227,7 +227,6 @@ struct kvm_arch { unsigned long host_sdr1; int tlbie_lock; unsigned long lpcr; - unsigned long rmor; unsigned long vrma_slb_v; int hpte_setup_done; u32 hpt_order; @@ -271,7 +270,6 @@ struct kvm_arch { */ struct kvmppc_vcore { int n_runnable; - int n_busy; int num_threads; int entry_exit_count; int n_woken; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 3fea721..92ec3fc 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -505,7 +505,6 @@ int main(void) DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits)); DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls)); DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr)); - DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor)); DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b06fe53..f8267e5 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -245,9 +245,9 @@ kvm_novcpu_exit: kvm_start_guest: /* Set runlatch bit the minute you wake up from nap */ - mfspr r1, SPRN_CTRLF - ori r1, r1, 1 - mtspr SPRN_CTRLT, r1 + mfspr r0, SPRN_CTRLF + ori r0, r0, 1 + mtspr SPRN_CTRLT, r0 ld r2,PACATOC(r13) @@ -493,11 +493,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) cmpwi r0,0 beq 20b - /* Set LPCR and RMOR. */ + /* Set LPCR. */ 10:ld r8,VCORE_LPCR(r5) mtspr SPRN_LPCR,r8 - ld r8,KVM_RMOR(r9) - mtspr SPRN_RMOR,r8 isync /* Check if HDEC expires soon */ @@ -1075,7 +1073,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) bne 2f mfspr r3,SPRN_HDEC cmpwi r3,0 - bge ignore_hdec + mr r4,r9 + bge fast_guest_return 2: /* See if this is an hcall we can handle in real mode */ cmpwi r12,BOOK3S_INTERRUPT_SYSCALL @@ -1083,26 +1082,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* External interrupt ? */ cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL - bne+ext_interrupt_to_host + bne+guest_exit_cont /* External interrupt, first check for host_ipi. If this is * set, we know the host wants us out so let's do it now */ bl kvmppc_read_intr cmpdi r3, 0 - bgt ext_interrupt_to_host + bgt guest_exit_cont /* Check if any CPU is heading out to the host, if so head out too */ ld r5, HSTATE_KVM_VCORE(r13) lwz r0, VCORE_ENTRY_EXIT(r5) cmpwi r0, 0x100 - bge ext_interrupt_to_host - - /* Return to guest after delivering any pending interrupt */ mr r4, r9 - b deliver_guest_interrupt - -ext_interrupt_to_host: + blt deliver_guest_interrupt guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ /* Save more register state */ @@ -1763,8 +1757,10 @@ kvmppc_hisi: * Returns to the guest if we handle it, or continues on up to * the kernel if we can't (i.e. if we don't have a handler for * it, or if the handler returns H_TOO_HARD). + * + * r5 - r8 contain hcall args, + * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca */ - .globl hcall_try_real_mode hcall_try_real_mode: ld r3,VCPU_GPR(R3)(r9) andi. r0,r11,MSR_PR @@ -2024,10 +2020,6 @@ hcall_real_table: .globl hcall_real_table_end hcall_real_table_end: -ignore_hdec: -
[PULL 20/21] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
From: Paul Mackerras This replaces the assembler code for kvmhv_commence_exit() with C code in book3s_hv_builtin.c. It also moves the IPI sending code that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq(). Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 + arch/powerpc/kvm/book3s_hv_builtin.c | 63 ++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 12 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 66 4 files changed, 75 insertions(+), 68 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 869c53f..2b84e48 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm) extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); +extern void kvmhv_rm_send_ipi(int cpu); + #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 2754251..c42aa55 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -22,6 +22,7 @@ #include #include #include +#include #define KVM_CMA_CHUNK_ORDER18 @@ -184,3 +185,65 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu) return H_HARDWARE; } + +static inline void rm_writeb(unsigned long paddr, u8 val) +{ + __asm__ __volatile__("stbcix %0,0,%1" + : : "r" (val), "r" (paddr) : "memory"); +} + +/* + * Send an interrupt to another CPU. + * This can only be called in real mode. + * The caller needs to include any barrier needed to order writes + * to memory vs. the IPI/message. + */ +void kvmhv_rm_send_ipi(int cpu) +{ + unsigned long xics_phys; + + /* Poke the target */ + xics_phys = paca[cpu].kvm_hstate.xics_phys; + rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY); +} + +/* + * The following functions are called from the assembly code + * in book3s_hv_rmhandlers.S. + */ +static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active) +{ + int cpu = vc->pcpu; + + /* Order setting of exit map vs. msgsnd/IPI */ + smp_mb(); + for (; active; active >>= 1, ++cpu) + if (active & 1) + kvmhv_rm_send_ipi(cpu); +} + +void kvmhv_commence_exit(int trap) +{ + struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore; + int ptid = local_paca->kvm_hstate.ptid; + int me, ee; + + /* Set our bit in the threads-exiting-guest map in the 0xff00 + bits of vcore->entry_exit_map */ + me = 0x100 << ptid; + do { + ee = vc->entry_exit_map; + } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee); + + /* Are we the first here? */ + if ((ee >> 8) != 0) + return; + + /* +* Trigger the other threads in this vcore to exit the guest. +* If this is a hypervisor decrementer interrupt then they +* will be already on their way out of the guest. +*/ + if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER) + kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid)); +} diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 6dded8c..00e45b6 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -26,12 +26,6 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, u32 new_irq); -static inline void rm_writeb(unsigned long paddr, u8 val) -{ - __asm__ __volatile__("sync; stbcix %0,0,%1" - : : "r" (val), "r" (paddr) : "memory"); -} - /* -- ICS routines -- */ static void ics_rm_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, struct kvmppc_icp *icp) @@ -60,7 +54,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, struct kvm_vcpu *this_vcpu) { struct kvmppc_icp *this_icp = this_vcpu->arch.icp; - unsigned long xics_phys; int cpu; /* Mark the target VCPU as having an interrupt pending */ @@ -83,9 +76,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, /* In SMT cpu will always point to thread 0, we adjust it */ cpu += vcpu->arch.ptid; - /* Not too hard, then poke the target */ - xics_phys = paca[cpu].kvm_hstate.xics_phys; - rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY); + smp_mb(); + kvmhv_rm_send_ipi(cpu); } static void icp_rm_clr_v
[PULL 16/21] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
From: Paul Mackerras When running a multi-threaded guest and vcpu 0 in a virtual core is not running in the guest (i.e. it is busy elsewhere in the host), thread 0 of the physical core will switch the MMU to the guest and then go to nap mode in the code at kvm_do_nap. If the guest sends an IPI to thread 0 using the msgsndp instruction, that will wake up thread 0 and cause all the threads in the guest to exit to the host unnecessarily. To avoid the unnecessary exit, this arranges for the PECEDP bit to be cleared in this situation. When napping due to a H_CEDE from the guest, we still set PECEDP so that the thread will wake up on an IPI sent using msgsndp. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 6716db3..12d7e4c 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -191,6 +191,7 @@ kvmppc_primary_no_guest: li r3, NAPPING_NOVCPU stb r3, HSTATE_NAPPING(r13) + li r3, 0 /* Don't wake on privileged (OS) doorbell */ b kvm_do_nap kvm_novcpu_wakeup: @@ -2129,10 +2130,13 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */ bl kvmhv_accumulate_time #endif + lis r3, LPCR_PECEDP@h /* Do wake on privileged doorbell */ + /* * Take a nap until a decrementer or external or doobell interrupt -* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the -* runlatch bit before napping. +* occurs, with PECE1 and PECE0 set in LPCR. +* On POWER8, if we are ceding, also set PECEDP. +* Also clear the runlatch bit before napping. */ kvm_do_nap: mfspr r0, SPRN_CTRLF @@ -2144,7 +2148,7 @@ kvm_do_nap: mfspr r5,SPRN_LPCR ori r5,r5,LPCR_PECE0 | LPCR_PECE1 BEGIN_FTR_SECTION - orisr5,r5,LPCR_PECEDP@h + rlwimi r5, r3, 0, LPCR_PECEDP END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtspr SPRN_LPCR,r5 isync -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 03/21] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.
From: Michael Ellerman Some PowerNV systems include a hardware random-number generator. This HWRNG is present on POWER7+ and POWER8 chips and is capable of generating one 64-bit random number every microsecond. The random numbers are produced by sampling a set of 64 unstable high-frequency oscillators and are almost completely entropic. PAPR defines an H_RANDOM hypercall which guests can use to obtain one 64-bit random sample from the HWRNG. This adds a real-mode implementation of the H_RANDOM hypercall. This hypercall was implemented in real mode because the latency of reading the HWRNG is generally small compared to the latency of a guest exit and entry for all the threads in the same virtual core. Userspace can detect the presence of the HWRNG and the H_RANDOM implementation by querying the KVM_CAP_PPC_HWRNG capability. The H_RANDOM hypercall implementation will only be invoked when the guest does an H_RANDOM hypercall if userspace first enables the in-kernel H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability. Signed-off-by: Michael Ellerman Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- Documentation/virtual/kvm/api.txt | 17 + arch/powerpc/include/asm/archrandom.h | 11 ++- arch/powerpc/include/asm/kvm_ppc.h | 2 + arch/powerpc/kvm/book3s_hv_builtin.c| 15 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 arch/powerpc/kvm/powerpc.c | 3 + arch/powerpc/platforms/powernv/rng.c| 29 include/uapi/linux/kvm.h| 1 + 8 files changed, 191 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bc9f6fe..9fa2bf8 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3573,3 +3573,20 @@ struct { @ar - access register number KVM handlers should exit to userspace with rc = -EREMOTE. + + +8. Other capabilities. +-- + +This section lists capabilities that give information about other +features of the KVM implementation. + +8.1 KVM_CAP_PPC_HWRNG + +Architectures: ppc + +This capability, if KVM_CHECK_EXTENSION indicates that it is +available, means that that the kernel has an implementation of the +H_RANDOM hypercall backed by a hardware random-number generator. +If present, the kernel H_RANDOM handler can be enabled for guest use +with the KVM_CAP_PPC_ENABLE_HCALL capability. diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h index bde5311..0cc6eed 100644 --- a/arch/powerpc/include/asm/archrandom.h +++ b/arch/powerpc/include/asm/archrandom.h @@ -30,8 +30,6 @@ static inline int arch_has_random(void) return !!ppc_md.get_random_long; } -int powernv_get_random_long(unsigned long *v); - static inline int arch_get_random_seed_long(unsigned long *v) { return 0; @@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void) #endif /* CONFIG_ARCH_RANDOM */ +#ifdef CONFIG_PPC_POWERNV +int powernv_hwrng_present(void); +int powernv_get_random_long(unsigned long *v); +int powernv_get_random_real_mode(unsigned long *v); +#else +static inline int powernv_hwrng_present(void) { return 0; } +static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; } +#endif + #endif /* _ASM_POWERPC_ARCHRANDOM_H */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 46bf652..b8475da 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm) return kvm->arch.kvm_ops == kvmppc_hv_ops; } +extern int kvmppc_hwrng_present(void); + /* * Cuts out inst bits with ordering according to spec. * That means the leftmost bit is zero. All given bits are included. diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 1f083ff..1954a1c 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -21,6 +21,7 @@ #include #include #include +#include #define KVM_CMA_CHUNK_ORDER18 @@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd) return 0; } EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode); + +int kvmppc_hwrng_present(void) +{ + return powernv_hwrng_present(); +} +EXPORT_SYMBOL_GPL(kvmppc_hwrng_present); + +long kvmppc_h_random(struct kvm_vcpu *vcpu) +{ + if (powernv_get_random_real_mode(&vcpu->arch.gpr[4])) + return H_SUCCESS; + + return H_HARDWARE; +} diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 6cbf163..0814ca1 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1839,6 +1839,121 @@ hcall_real_table: .long 0 /* 0x12c */ .long 0
[PULL 21/21] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
From: Paul Mackerras This uses msgsnd where possible for signalling other threads within the same core on POWER8 systems, rather than IPIs through the XICS interrupt controller. This includes waking secondary threads to run the guest, the interrupts generated by the virtual XICS, and the interrupts to bring the other threads out of the guest when exiting. Aggregated statistics from debugfs across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, show this before the change: rm_entry: 3387.6ns (228 - 86600, 1008969 samples) rm_exit: 4561.5ns (12 - 3477452, 1009402 samples) rm_intr: 1660.0ns (12 - 553050, 3600051 samples) and this after the change: rm_entry: 3060.1ns (212 - 65138, 953873 samples) rm_exit: 4244.1ns (12 - 9693408, 954331 samples) rm_intr: 1342.3ns (12 - 1104718, 3405326 samples) for a test of booting Fedora 20 big-endian to the login prompt. The time taken for a H_PROD hcall (which is handled in the host kernel) went down from about 35 microseconds to about 16 microseconds with this change. The noinline added to kvmppc_run_core turned out to be necessary for good performance, at least with gcc 4.9.2 as packaged with Fedora 21 and a little-endian POWER8 host. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kernel/asm-offsets.c | 3 ++ arch/powerpc/kvm/book3s_hv.c| 51 ++--- arch/powerpc/kvm/book3s_hv_builtin.c| 16 +-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 -- 4 files changed, 70 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 0d07efb..0034b6b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef CONFIG_PPC64 #include #include @@ -759,5 +760,7 @@ int main(void) offsetof(struct paca_struct, subcore_sibling_mask)); #endif + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER); + return 0; } diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ea1600f..48d3c5d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include #include @@ -84,9 +85,35 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); +static bool kvmppc_ipi_thread(int cpu) +{ + /* On POWER8 for IPIs to threads in the same core, use msgsnd */ + if (cpu_has_feature(CPU_FTR_ARCH_207S)) { + preempt_disable(); + if (cpu_first_thread_sibling(cpu) == + cpu_first_thread_sibling(smp_processor_id())) { + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); + msg |= cpu_thread_in_core(cpu); + smp_mb(); + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg)); + preempt_enable(); + return true; + } + preempt_enable(); + } + +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) + if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) { + xics_wake_cpu(cpu); + return true; + } +#endif + + return false; +} + static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int me; int cpu = vcpu->cpu; wait_queue_head_t *wqp; @@ -96,20 +123,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) ++vcpu->stat.halt_wakeup; } - me = get_cpu(); + if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid)) + return; /* CPU points to the first thread of the core */ - if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) { -#ifdef CONFIG_PPC_ICP_NATIVE - int real_cpu = cpu + vcpu->arch.ptid; - if (paca[real_cpu].kvm_hstate.xics_phys) - xics_wake_cpu(real_cpu); - else -#endif - if (cpu_online(cpu)) - smp_send_reschedule(cpu); - } - put_cpu(); + if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu)) + smp_send_reschedule(cpu); } /* @@ -1781,10 +1800,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */ smp_wmb(); tpaca->kvm_hstate.kvm_vcpu = vcpu; -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) if (cpu != smp_processor_id()) - xics_wake_cpu(cpu); -#endif + kvmppc_ipi_thread(cp
[PULL 19/21] KVM: PPC: Book3S HV: Streamline guest entry and exit
From: Paul Mackerras On entry to the guest, secondary threads now wait for the primary to switch the MMU after loading up most of their state, rather than before. This means that the secondary threads get into the guest sooner, in the common case where the secondary threads get to kvmppc_hv_entry before the primary thread. On exit, the first thread out increments the exit count and interrupts the other threads (to get them out of the guest) before saving most of its state, rather than after. That means that the other threads exit sooner and means that the first thread doesn't spend so much time waiting for the other threads at the point where the MMU gets switched back to the host. This pulls out the code that increments the exit count and interrupts other threads into a separate function, kvmhv_commence_exit(). This also makes sure that r12 and vcpu->arch.trap are set correctly in some corner cases. Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the improvement. Aggregating across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, gives this before the change: rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples) rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples) rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples) and this after the change: rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples) rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples) rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples) showing a substantial reduction in the time spent per guest entry in the real-mode guest entry code, and smaller reductions in the real mode guest exit and interrupt handling times. (The test was to start the guest and boot Fedora 20 big-endian to the login prompt.) Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 212 +++- 1 file changed, 126 insertions(+), 86 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 245f5c9..3f6fd78 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -175,6 +175,19 @@ kvmppc_primary_no_guest: /* put the HDEC into the DEC, since HDEC interrupts don't wake us */ mfspr r3, SPRN_HDEC mtspr SPRN_DEC, r3 + /* +* Make sure the primary has finished the MMU switch. +* We should never get here on a secondary thread, but +* check it for robustness' sake. +*/ + ld r5, HSTATE_KVM_VCORE(r13) +65:lbz r0, VCORE_IN_GUEST(r5) + cmpwi r0, 0 + beq 65b + /* Set LPCR. */ + ld r8,VCORE_LPCR(r5) + mtspr SPRN_LPCR,r8 + isync /* set our bit in napping_threads */ ld r5, HSTATE_KVM_VCORE(r13) lbz r7, HSTATE_PTID(r13) @@ -206,7 +219,7 @@ kvm_novcpu_wakeup: /* check the wake reason */ bl kvmppc_check_wake_reason - + /* see if any other thread is already exiting */ lwz r0, VCORE_ENTRY_EXIT(r5) cmpwi r0, 0x100 @@ -244,7 +257,15 @@ kvm_novcpu_wakeup: b kvmppc_got_guest kvm_novcpu_exit: - b hdec_soon +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING + ld r4, HSTATE_KVM_VCPU(r13) + cmpdi r4, 0 + beq 13f + addir3, r4, VCPU_TB_RMEXIT + bl kvmhv_accumulate_time +#endif +13:bl kvmhv_commence_exit + b kvmhv_switch_to_host /* * We come in here when wakened from nap mode. @@ -422,7 +443,7 @@ kvmppc_hv_entry: /* Primary thread switches to guest partition. */ ld r9,VCORE_KVM(r5)/* pointer to struct kvm */ cmpwi r6,0 - bne 20f + bne 10f ld r6,KVM_SDR1(r9) lwz r7,KVM_LPID(r9) li r0,LPID_RSVD/* switch to reserved LPID */ @@ -493,26 +514,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) li r0,1 stb r0,VCORE_IN_GUEST(r5) /* signal secondaries to continue */ - b 10f - - /* Secondary threads wait for primary to have done partition switch */ -20:lbz r0,VCORE_IN_GUEST(r5) - cmpwi r0,0 - beq 20b - - /* Set LPCR. */ -10:ld r8,VCORE_LPCR(r5) - mtspr SPRN_LPCR,r8 - isync - - /* Check if HDEC expires soon */ - mfspr r3,SPRN_HDEC - cmpwi r3,512 /* 1 microsecond */ - li r12,BOOK3S_INTERRUPT_HV_DECREMENTER - blt hdec_soon /* Do we have a guest vcpu to run? */ - cmpdi r4, 0 +10:cmpdi r4, 0 beq kvmppc_primary_no_guest kvmppc_got_guest: @@ -837,6 +841,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) clrrdi r6,r6,1 mtspr SPRN_CTRLT,r6 4: + /* Secondary threads wait for primary to have done partition switch */
[PULL 04/21] KVM: PPC: Book3S HV: Remove RMA-related variables from code
From: "Aneesh Kumar K.V" We don't support real-mode areas now that 970 support is removed. Remove the remaining details of rma from the code. Also rename rma_setup_done to hpte_setup_done to better reflect the changes. Signed-off-by: Aneesh Kumar K.V Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 3 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++-- arch/powerpc/kvm/book3s_hv.c| 10 +- 3 files changed, 20 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 8ef0512..015773f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -228,9 +228,8 @@ struct kvm_arch { int tlbie_lock; unsigned long lpcr; unsigned long rmor; - struct kvm_rma_info *rma; unsigned long vrma_slb_v; - int rma_setup_done; + int hpte_setup_done; u32 hpt_order; atomic_t vcpus_running; u32 online_vcores; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3..dbf1271 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) long order; mutex_lock(&kvm->lock); - if (kvm->arch.rma_setup_done) { - kvm->arch.rma_setup_done = 0; - /* order rma_setup_done vs. vcpus_running */ + if (kvm->arch.hpte_setup_done) { + kvm->arch.hpte_setup_done = 0; + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(&kvm->arch.vcpus_running)) { - kvm->arch.rma_setup_done = 1; + kvm->arch.hpte_setup_done = 1; goto out; } } @@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, unsigned long tmp[2]; ssize_t nb; long int err, ret; - int rma_setup; + int hpte_setup; if (!access_ok(VERIFY_READ, buf, count)) return -EFAULT; /* lock out vcpus from running while we're doing this */ mutex_lock(&kvm->lock); - rma_setup = kvm->arch.rma_setup_done; - if (rma_setup) { - kvm->arch.rma_setup_done = 0; /* temporarily */ - /* order rma_setup_done vs. vcpus_running */ + hpte_setup = kvm->arch.hpte_setup_done; + if (hpte_setup) { + kvm->arch.hpte_setup_done = 0; /* temporarily */ + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(&kvm->arch.vcpus_running)) { - kvm->arch.rma_setup_done = 1; + kvm->arch.hpte_setup_done = 1; mutex_unlock(&kvm->lock); return -EBUSY; } @@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, "r=%lx\n", ret, i, v, r); goto out; } - if (!rma_setup && is_vrma_hpte(v)) { + if (!hpte_setup && is_vrma_hpte(v)) { unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; @@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, (VRMA_VSID << SLB_VSID_SHIFT_1T); lpcr = senc << (LPCR_VRMASD_SH - 4); kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD); - rma_setup = 1; + hpte_setup = 1; } ++i; hptp += 2; @@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, } out: - /* Order HPTE updates vs. rma_setup_done */ + /* Order HPTE updates vs. hpte_setup_done */ smp_wmb(); - kvm->arch.rma_setup_done = rma_setup; + kvm->arch.hpte_setup_done = hpte_setup; mutex_unlock(&kvm->lock); if (err) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b9c11a3..dde14fd 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2044,11 +2044,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) } atomic_inc(&vcpu->kvm->
[PULL 18/21] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
From: Paul Mackerras Currently, the entry_exit_count field in the kvmppc_vcore struct contains two 8-bit counts, one of the threads that have started entering the guest, and one of the threads that have started exiting the guest. This changes it to an entry_exit_map field which contains two bitmaps of 8 bits each. The advantage of doing this is that it gives us a bitmap of which threads need to be signalled when exiting the guest. That means that we no longer need to use the trick of setting the HDEC to 0 to pull the other threads out of the guest, which led in some cases to a spurious HDEC interrupt on the next guest entry. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 15 arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kvm/book3s_hv.c| 5 ++- arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++-- 5 files changed, 44 insertions(+), 49 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1517faa..d67a838 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -263,15 +263,15 @@ struct kvm_arch { /* * Struct for a virtual core. - * Note: entry_exit_count combines an entry count in the bottom 8 bits - * and an exit count in the next 8 bits. This is so that we can - * atomically increment the entry count iff the exit count is 0 - * without taking the lock. + * Note: entry_exit_map combines a bitmap of threads that have entered + * in the bottom 8 bits and a bitmap of threads that have exited in the + * next 8 bits. This is so that we can atomically set the entry bit + * iff the exit map is 0 without taking a lock. */ struct kvmppc_vcore { int n_runnable; int num_threads; - int entry_exit_count; + int entry_exit_map; int napping_threads; int first_vcpuid; u16 pcpu; @@ -296,8 +296,9 @@ struct kvmppc_vcore { ulong conferring_threads; }; -#define VCORE_ENTRY_COUNT(vc) ((vc)->entry_exit_count & 0xff) -#define VCORE_EXIT_COUNT(vc) ((vc)->entry_exit_count >> 8) +#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff) +#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8) +#define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) /* Values for vcore_state */ #define VCORE_INACTIVE 0 diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 8aa8246..0d07efb 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -562,7 +562,7 @@ int main(void) DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop)); DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort)); DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1)); - DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count)); + DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map)); DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest)); DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads)); DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7c1335d..ea1600f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1952,7 +1952,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) /* * Initialize *vc. */ - vc->entry_exit_count = 0; + vc->entry_exit_map = 0; vc->preempt_tb = TB_NIL; vc->in_guest = 0; vc->napping_threads = 0; @@ -2119,8 +2119,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * this thread straight away and have it join in. */ if (!signal_pending(current)) { - if (vc->vcore_state == VCORE_RUNNING && - VCORE_EXIT_COUNT(vc) == 0) { + if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) { kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); trace_kvm_guest_enter(vcpu); diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 1954a1c..2754251 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int target, int rv = H_SUCCESS; /* => don't yield */ set_bit(vcpu->arch.ptid, &vc->conferring_threads); - while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) { - threads_running = VCORE_ENTRY_COUNT(vc); - threads_ceded = hweight32(vc->napping
[PULL 17/21] KVM: PPC: Book3S HV: Use decrementer to wake napping threads
From: Paul Mackerras This arranges for threads that are napping due to their vcpu having ceded or due to not having a vcpu to wake up at the end of the guest's timeslice without having to be poked with an IPI. We do that by arranging for the decrementer to contain a value no greater than the number of timebase ticks remaining until the end of the timeslice. In the case of a thread with no vcpu, this number is in the hypervisor decrementer already. In the case of a ceded vcpu, we use the smaller of the HDEC value and the DEC value. Using the DEC like this when ceded means we need to save and restore the guest decrementer value around the nap. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 12d7e4c..16719af 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) kvmppc_primary_no_guest: /* We handle this much like a ceded vcpu */ + /* put the HDEC into the DEC, since HDEC interrupts don't wake us */ + mfspr r3, SPRN_HDEC + mtspr SPRN_DEC, r3 /* set our bit in napping_threads */ ld r5, HSTATE_KVM_VCORE(r13) lbz r7, HSTATE_PTID(r13) @@ -223,6 +226,12 @@ kvm_novcpu_wakeup: cmpdi r3, 0 bge kvm_novcpu_exit + /* See if our timeslice has expired (HDEC is negative) */ + mfspr r0, SPRN_HDEC + li r12, BOOK3S_INTERRUPT_HV_DECREMENTER + cmpwi r0, 0 + blt kvm_novcpu_exit + /* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */ ld r4, HSTATE_KVM_VCPU(r13) cmpdi r4, 0 @@ -1493,10 +1502,10 @@ kvmhv_do_exit: /* r12 = trap, r13 = paca */ cmpwi r3,0x100/* Are we the first here? */ bge 43f cmpwi r12,BOOK3S_INTERRUPT_HV_DECREMENTER - beq 40f + beq 43f li r0,0 mtspr SPRN_HDEC,r0 -40: + /* * Send an IPI to any napping threads, since an HDEC interrupt * doesn't wake CPUs up from nap. @@ -2124,6 +2133,27 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */ /* save FP state */ bl kvmppc_save_fp + /* +* Set DEC to the smaller of DEC and HDEC, so that we wake +* no later than the end of our timeslice (HDEC interrupts +* don't wake us from nap). +*/ + mfspr r3, SPRN_DEC + mfspr r4, SPRN_HDEC + mftbr5 + cmpwr3, r4 + ble 67f + mtspr SPRN_DEC, r4 +67: + /* save expiry time of guest decrementer */ + extsw r3, r3 + add r3, r3, r5 + ld r4, HSTATE_KVM_VCPU(r13) + ld r5, HSTATE_KVM_VCORE(r13) + ld r6, VCORE_TB_OFFSET(r5) + subfr3, r6, r3 /* convert to host TB value */ + std r3, VCPU_DEC_EXPIRES(r4) + #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING ld r4, HSTATE_KVM_VCPU(r13) addir3, r4, VCPU_TB_CEDE @@ -2181,6 +2211,15 @@ kvm_end_cede: /* load up FP state */ bl kvmppc_load_fp + /* Restore guest decrementer */ + ld r3, VCPU_DEC_EXPIRES(r4) + ld r5, HSTATE_KVM_VCORE(r13) + ld r6, VCORE_TB_OFFSET(r5) + add r3, r3, r6 /* convert host TB to guest TB value */ + mftbr7 + subfr3, r7, r3 + mtspr SPRN_DEC, r3 + /* Load NV GPRS */ ld r14, VCPU_GPR(R14)(r4) ld r15, VCPU_GPR(R15)(r4) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 02/21] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
From: David Gibson On POWER, storage caching is usually configured via the MMU - attributes such as cache-inhibited are stored in the TLB and the hashed page table. This makes correctly performing cache inhibited IO accesses awkward when the MMU is turned off (real mode). Some CPU models provide special registers to control the cache attributes of real mode load and stores but this is not at all consistent. This is a problem in particular for SLOF, the firmware used on KVM guests, which runs entirely in real mode, but which needs to do IO to load the kernel. To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to a logical address (aka guest physical address). SLOF uses these for IO. However, because these are implemented within qemu, not the host kernel, these bypass any IO devices emulated within KVM itself. The simplest way to see this problem is to attempt to boot a KVM guest from a virtio-blk device with iothread / dataplane enabled. The iothread code relies on an in kernel implementation of the virtio queue notification, which is not triggered by the IO hcalls, and so the guest will stall in SLOF unable to load the guest OS. This patch addresses this by providing in-kernel implementations of the 2 hypercalls, which correctly scan the KVM IO bus. Any access to an address not handled by the KVM IO bus will cause a VM exit, hitting the qemu implementation as before. Note that a userspace change is also required, in order to enable these new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. Signed-off-by: David Gibson [agraf: fix compilation] Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 3 ++ arch/powerpc/kvm/book3s.c | 76 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_pr_papr.c | 28 + 4 files changed, 119 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 942c7b1..578e550 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -292,6 +292,9 @@ static inline bool kvmppc_supports_magic_page(struct kvm_vcpu *vcpu) return !is_kvmppc_hv_enabled(vcpu->kvm); } +extern int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu); +extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu); + /* Magic register values loaded into r3 and r4 before the 'sc' assembly * instruction for the OSI hypercalls */ #define OSI_SC_MAGIC_R30x113724FA diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index cfbcdc6..453a8a4 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm) #endif } +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu) +{ + unsigned long size = kvmppc_get_gpr(vcpu, 4); + unsigned long addr = kvmppc_get_gpr(vcpu, 5); + u64 buf; + int ret; + + if (!is_power_of_2(size) || (size > sizeof(buf))) + return H_TOO_HARD; + + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf); + if (ret != 0) + return H_TOO_HARD; + + switch (size) { + case 1: + kvmppc_set_gpr(vcpu, 4, *(u8 *)&buf); + break; + + case 2: + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)&buf)); + break; + + case 4: + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)&buf)); + break; + + case 8: + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)&buf)); + break; + + default: + BUG(); + } + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load); + +int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu) +{ + unsigned long size = kvmppc_get_gpr(vcpu, 4); + unsigned long addr = kvmppc_get_gpr(vcpu, 5); + unsigned long val = kvmppc_get_gpr(vcpu, 6); + u64 buf; + int ret; + + switch (size) { + case 1: + *(u8 *)&buf = val; + break; + + case 2: + *(__be16 *)&buf = cpu_to_be16(val); + break; + + case 4: + *(__be32 *)&buf = cpu_to_be32(val); + break; + + case 8: + *(__be64 *)&buf = cpu_to_be64(val); + break; + + default: + return H_TOO_HARD; + } + + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, &buf); + if (ret != 0) + return H_TOO_HARD; + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store); + int kvmppc_core_check_processor_compat(void) { /* diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index de74756.
[PULL 08/21] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
From: Suresh Warrier Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs to switch to the host to complete the rest of hypercall function in virtual mode. This patch ports the virtual mode ICS/ICP reject and resend functions to be runnable in hypervisor real mode, thus avoiding the need to switch to the host to execute these functions in virtual mode. However, the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify events - these events cannot be done in real mode and they will still need a switch to host virtual mode. There are sufficient differences between the real mode code and the virtual mode code for the ICS/ICP resend and reject functions that for now the code has been duplicated instead of sharing common code. In the future, we can look at creating common functions. Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 --- 1 file changed, 211 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 7c22997..73bbe92 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -23,12 +23,39 @@ #define DEBUG_PASSUP +static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, + u32 new_irq); + static inline void rm_writeb(unsigned long paddr, u8 val) { __asm__ __volatile__("sync; stbcix %0,0,%1" : : "r" (val), "r" (paddr) : "memory"); } +/* -- ICS routines -- */ +static void ics_rm_check_resend(struct kvmppc_xics *xics, + struct kvmppc_ics *ics, struct kvmppc_icp *icp) +{ + int i; + + arch_spin_lock(&ics->lock); + + for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { + struct ics_irq_state *state = &ics->irq_state[i]; + + if (!state->resend) + continue; + + arch_spin_unlock(&ics->lock); + icp_rm_deliver_irq(xics, icp, state->number); + arch_spin_lock(&ics->lock); + } + + arch_spin_unlock(&ics->lock); +} + +/* -- ICP routines -- */ + static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, struct kvm_vcpu *this_vcpu) { @@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics, return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS; } +static void icp_rm_check_resend(struct kvmppc_xics *xics, +struct kvmppc_icp *icp) +{ + u32 icsid; + + /* Order this load with the test for need_resend in the caller */ + smp_rmb(); + for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) { + struct kvmppc_ics *ics = xics->ics[icsid]; + + if (!test_and_clear_bit(icsid, icp->resend_map)) + continue; + if (!ics) + continue; + ics_rm_check_resend(xics, ics, icp); + } +} + +static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority, + u32 *reject) +{ + union kvmppc_icp_state old_state, new_state; + bool success; + + do { + old_state = new_state = READ_ONCE(icp->state); + + *reject = 0; + + /* See if we can deliver */ + success = new_state.cppr > priority && + new_state.mfrr > priority && + new_state.pending_pri > priority; + + /* +* If we can, check for a rejection and perform the +* delivery +*/ + if (success) { + *reject = new_state.xisr; + new_state.xisr = irq; + new_state.pending_pri = priority; + } else { + /* +* If we failed to deliver we set need_resend +* so a subsequent CPPR state change causes us +* to try a new delivery. +*/ + new_state.need_resend = true; + } + + } while (!icp_rm_try_update(icp, old_state, new_state)); + + return success; +} + +static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, + u32 new_irq) +{ + struct ics_irq_state *state; + struct kvmppc_ics *ics; + u32 reject; + u16 src; + + /* +* This is used both for initial delivery of an interrupt and +* for subsequent rejection. +* +* Rejection can be racy vs. resends. We have evaluated the +
[PULL 05/21] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
From: "Aneesh Kumar K.V" This adds helper routines for locking and unlocking HPTEs, and uses them in the rest of the code. We don't change any locking rules in this patch. Signed-off-by: Aneesh Kumar K.V Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 3 files changed, 33 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e20..0789a0f 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v &= ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER "" : : : "memory"); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v &= ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index dbf1271..6c6825a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK; gr = kvm->arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile("lwsync" : : : "memory"); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte->eaddr = eaddr; @@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev->guest_rpte; - asm volatile("lwsync" : : : "memory"); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu->arch.pgfault_hpte[0] || @@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile("ptesync" : : : "memory"); preempt_enable(); if (page && hpte_is_writable(r)) @@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v &= ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r &= ~HPTE_GR_MODIFIED; revp->guest_rpte = r; } -
[PULL 01/21] powerpc: Export __spin_yield
From: "Suresh E. Warrier" Export __spin_yield so that the arch_spin_unlock() function can be invoked from a module. This will be required for modules where we want to take a lock that is also is acquired in hypervisor real mode. Because we want to avoid running any lockdep code (which may not be safe in real mode), this lock needs to be an arch_spinlock_t instead of a normal spinlock. Signed-off-by: Suresh Warrier Acked-by: Paul Mackerras Acked-by: Michael Ellerman Signed-off-by: Alexander Graf --- arch/powerpc/lib/locks.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index 170a034..f7deebd 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } +EXPORT_SYMBOL_GPL(__spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 00/21] ppc patch queue 2015-04-21 for 4.1
Hi Paolo / Marcelo, This is my current patch queue for ppc. Please pull. Alex The following changes since commit b79013b2449c23f1f505bdf39c5a6c330338b244: Merge tag 'staging-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging (2015-04-13 17:37:33 -0700) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-queue for you to fetch changes up to 66feed61cdf6ee65fd551d3460b1efba6bee55b8: KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 (2015-04-21 15:21:34 +0200) Patch queue for ppc - 2015-04-21 This is the latest queue for KVM on PowerPC changes. Highlights this time around: - Book3S HV: Debugging aids - Book3S HV: Minor performance improvements - Book3S HV: Cleanups Aneesh Kumar K.V (2): KVM: PPC: Book3S HV: Remove RMA-related variables from code KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte David Gibson (1): kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM Michael Ellerman (1): KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. Paul Mackerras (12): KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 Suresh E. Warrier (2): powerpc: Export __spin_yield KVM: PPC: Book3S HV: Add guest->host real mode completion counters Suresh Warrier (3): KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Add ICP real mode counters Documentation/virtual/kvm/api.txt| 17 + arch/powerpc/include/asm/archrandom.h| 11 +- arch/powerpc/include/asm/kvm_book3s.h| 3 + arch/powerpc/include/asm/kvm_book3s_64.h | 18 + arch/powerpc/include/asm/kvm_host.h | 47 ++- arch/powerpc/include/asm/kvm_ppc.h | 2 + arch/powerpc/include/asm/time.h | 3 + arch/powerpc/kernel/asm-offsets.c| 20 +- arch/powerpc/kernel/time.c | 6 + arch/powerpc/kvm/Kconfig | 14 + arch/powerpc/kvm/book3s.c| 76 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 189 +-- arch/powerpc/kvm/book3s_hv.c | 435 ++-- arch/powerpc/kvm/book3s_hv_builtin.c | 100 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 +- arch/powerpc/kvm/book3s_hv_rm_xics.c | 238 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 559 +++ arch/powerpc/kvm/book3s_pr_papr.c| 28 ++ arch/powerpc/kvm/book3s_xics.c | 105 -- arch/powerpc/kvm/book3s_xics.h | 13 +- arch/powerpc/kvm/powerpc.c | 3 + arch/powerpc/lib/locks.c | 1 + arch/powerpc/platforms/powernv/rng.c | 29 ++ include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 1 + 25 files changed, 1580 insertions(+), 364 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 10/21] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
From: Paul Mackerras This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vm, where is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 000bd7ff1196 0003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 + arch/powerpc/include/asm/kvm_host.h | 2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 136 +++ arch/powerpc/kvm/book3s_hv.c | 12 +++ virt/kvm/kvm_main.c | 1 + 5 files changed, 153 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0789a0f..869c53f 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm) return rcu_dereference_raw_notrace(kvm->memslots); } +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); + #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 015773f..f1d0bbc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -238,6 +238,8 @@ struct kvm_arch { atomic_t hpte_mod_interest; cpumask_t need_tlb_flush; int hpt_cma_alloc; + struct dentry *debugfs_dir; + struct dentry *htab_dentry; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE struct mutex hpt_mutex; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 6c6825a..d6fe308 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *ghf) return ret; } +struct debugfs_htab_state { + struct kvm *kvm; + struct mutexmutex; + unsigned long hpt_index; + int chars_left; + int buf_index; + charbuf[64]; +}; + +static int debugfs_htab_open(struct inode *inode, struct file *file) +{ + struct kvm *kvm = inode->i_private; + struct debugfs_htab_state *p; + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) + return -ENOMEM; + + kvm_get_kvm(kvm); + p->kvm = kvm; + mutex_init(&p->mutex); + file->private_data = p; + + return nonseekable_open(inode, file); +} + +static int debugfs_htab_release(struct inode *inode, struct file *file) +{ + struct debugfs_htab_state *p = file->private_data; + + kvm_put_kvm(p->kvm); + kfree(p); + return 0; +} + +static ssize_t debugfs_htab_read(struct file *file, char __user *buf, +size_t len, loff_t *ppos) +{ + struct debugfs_htab_state *p = file->private_data; + ssize_t ret, r; + unsigned long i, n; + unsigned long v, hr, gr; + struct kvm *kvm; + __be64 *hptp; + + ret = mutex_lock_interruptible(&p->mutex); + if (ret) + return ret; + + if (p->chars_left) { + n = p->chars_left; + if (n > len) + n = len; + r = copy_to_user(buf, p->buf + p->buf_index, n); + n -= r; + p->chars_left -= n; + p->buf_index += n; + buf += n; + len -= n; + ret = n; + if (r) { + if (!n) + ret = -EFAULT; + goto out; + } + } + + kvm = p->kvm; + i = p->hpt_index; + hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE)); + for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) { + if (!(be64_t
Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
On 04/21/2015 02:41 AM, David Gibson wrote: On POWER, storage caching is usually configured via the MMU - attributes such as cache-inhibited are stored in the TLB and the hashed page table. This makes correctly performing cache inhibited IO accesses awkward when the MMU is turned off (real mode). Some CPU models provide special registers to control the cache attributes of real mode load and stores but this is not at all consistent. This is a problem in particular for SLOF, the firmware used on KVM guests, which runs entirely in real mode, but which needs to do IO to load the kernel. To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to a logical address (aka guest physical address). SLOF uses these for IO. However, because these are implemented within qemu, not the host kernel, these bypass any IO devices emulated within KVM itself. The simplest way to see this problem is to attempt to boot a KVM guest from a virtio-blk device with iothread / dataplane enabled. The iothread code relies on an in kernel implementation of the virtio queue notification, which is not triggered by the IO hcalls, and so the guest will stall in SLOF unable to load the guest OS. This patch addresses this by providing in-kernel implementations of the 2 hypercalls, which correctly scan the KVM IO bus. Any access to an address not handled by the KVM IO bus will cause a VM exit, hitting the qemu implementation as before. Note that a userspace change is also required, in order to enable these new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. Signed-off-by: David Gibson --- arch/powerpc/include/asm/kvm_book3s.h | 3 ++ arch/powerpc/kvm/book3s.c | 76 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_pr_papr.c | 28 + 4 files changed, 119 insertions(+) Changes in v4: * Rebase onto 4.0+, correct for changed signature of kvm_io_bus_{read,write} Alex, I saw from some build system notifications that you seemed to hit some troubles compiling the last version of this patch. This should fix it - hope it's not too late to get into 4.1. Oh, I already fixed it up in my tree, no worries. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/12] Remaining improvements for HV KVM
On 09.04.15 10:49, Paolo Bonzini wrote: > > > On 09/04/2015 00:57, Alexander Graf wrote: >>> >>> The last patch in this series needs a definition of PPC_MSGCLR that is >>> added by the patch "powerpc/powernv: Fixes for hypervisor doorbell >>> handling", which has now gone upstream into Linus' tree as commit >>> 755563bc79c7 via the linuxppc-dev mailing list. Alex, how do you want >>> to handle that? You could pull in the master branch of the kvm tree, >>> which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and >>> let the subsequent merge fix it up. >> >> I've just cherry-picked it for now since it still lives in my queue, so >> it will get thrown out automatically once I rebase on next if it's >> included in there. >> >> Paolo / Marcelo, could you please try to somehow get the commit above >> into the next branch somehow? I guess the easiest would be to merge >> linus/master into kvm/next. >> >> Thanks, applied all to kvm-ppc-queue. > > I plan to send the x86/MIPS/s390/ARM merge very early to Linus, maybe > even tomorrow. So you can just rebase on top of 4.0-rc6 and send your > pull request relative to Linus's tree instead of kvm/next. > > Does that work for you? Phew, that really complicates things on my side. I usually do kvm-ppc-queue -> kvm-ppc-next -> kvm/next which means that my queue already contains your next patches. I could of course to a rebase --onto and remove anything that is in the kvm tree, but then we'd end up conflicting on documentation changes. Since you already did send out the first pull request, just let me know when you pulled linus' tree back into kvm/next (or kvm/master) so that I can fast-forward merge this in my kvm-ppc-next branch and then rebase my queue on top, merge it into the next branch and send you a pull request ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/12] Remaining improvements for HV KVM
On 14.04.15 13:56, Paul Mackerras wrote: > On Thu, Apr 09, 2015 at 12:57:58AM +0200, Alexander Graf wrote: >> On 03/28/2015 04:21 AM, Paul Mackerras wrote: >>> This is the rest of my current patch queue for HV KVM on PPC. This >>> series is based on Alex Graf's kvm-ppc-queue branch. The only change >> >from the previous version of this series is that patch 2 has been >>> updated to take account of the timebase offset. >>> >>> The last patch in this series needs a definition of PPC_MSGCLR that is >>> added by the patch "powerpc/powernv: Fixes for hypervisor doorbell >>> handling", which has now gone upstream into Linus' tree as commit >>> 755563bc79c7 via the linuxppc-dev mailing list. Alex, how do you want >>> to handle that? You could pull in the master branch of the kvm tree, >>> which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and >>> let the subsequent merge fix it up. >> >> I've just cherry-picked it for now since it still lives in my queue, so it >> will get thrown out automatically once I rebase on next if it's included in >> there. >> >> Paolo / Marcelo, could you please try to somehow get the commit above into >> the next branch somehow? I guess the easiest would be to merge linus/master >> into kvm/next. >> >> Thanks, applied all to kvm-ppc-queue. > > Did you forget to push it out or something? Your kvm-ppc-queue branch > is still at 4.0-rc1 as far as I can see. Oops, not sure how that happened. Does it show up correctly for you now? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/12] Remaining improvements for HV KVM
On 03/28/2015 04:21 AM, Paul Mackerras wrote: This is the rest of my current patch queue for HV KVM on PPC. This series is based on Alex Graf's kvm-ppc-queue branch. The only change from the previous version of this series is that patch 2 has been updated to take account of the timebase offset. The last patch in this series needs a definition of PPC_MSGCLR that is added by the patch "powerpc/powernv: Fixes for hypervisor doorbell handling", which has now gone upstream into Linus' tree as commit 755563bc79c7 via the linuxppc-dev mailing list. Alex, how do you want to handle that? You could pull in the master branch of the kvm tree, which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and let the subsequent merge fix it up. I've just cherry-picked it for now since it still lives in my queue, so it will get thrown out automatically once I rebase on next if it's included in there. Paolo / Marcelo, could you please try to somehow get the commit above into the next branch somehow? I guess the easiest would be to merge linus/master into kvm/next. Thanks, applied all to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 3/3] KVM: PPC: Book3S HV: Fix instruction emulation
From: Paul Mackerras Commit 4a157d61b48c ("KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register") had the side effect that we no longer reset vcpu->arch.last_inst to -1 on guest exit in the cases where the instruction is not fetched from the guest. This means that if instruction emulation turns out to be required in those cases, the host will emulate the wrong instruction, since vcpu->arch.last_inst will contain the last instruction that was emulated. This fixes it by making sure that vcpu->arch.last_inst is reset to -1 in those cases. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bb94e6f..6cbf163 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Save HEIR (HV emulation assist reg) in emul_inst if this is an HEI (HV emulation interrupt, e40) */ li r3,KVM_INST_FETCH_FAILED + stw r3,VCPU_LAST_INST(r9) cmpwi r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST bne 11f mfspr r3,SPRN_HEIR -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 1/3] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()
From: Paul Mackerras Currently, kvmppc_set_lpcr() has a spinlock around the whole function, and inside that does mutex_lock(&kvm->lock). It is not permitted to take a mutex while holding a spinlock, because the mutex_lock might call schedule(). In addition, this causes lockdep to warn about a lock ordering issue: == [ INFO: possible circular locking dependency detected ] 3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted --- qemu-system-ppc/8179 is trying to acquire lock: (&kvm->lock){+.+.+.}, at: [] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv] but task is already holding lock: (&(&vcore->lock)->rlock){+.+...}, at: [] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&vcore->lock)->rlock){+.+...}: [] .mutex_lock_nested+0x80/0x570 [] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv] [] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm] [] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm] [] .do_vfs_ioctl+0x444/0x770 [] .SyS_ioctl+0xc4/0xe0 [] syscall_exit+0x0/0x98 -> #0 (&kvm->lock){+.+.+.}: [] .lock_acquire+0xcc/0x1a0 [] .mutex_lock_nested+0x80/0x570 [] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv] [] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv] [] .kvmppc_set_one_reg+0x44/0x330 [kvm] [] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm] [] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm] [] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm] [] .do_vfs_ioctl+0x444/0x770 [] .SyS_ioctl+0xc4/0xe0 [] syscall_exit+0x0/0x98 other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(&(&vcore->lock)->rlock); lock(&kvm->lock); lock(&(&vcore->lock)->rlock); lock(&kvm->lock); *** DEADLOCK *** 2 locks held by qemu-system-ppc/8179: #0: (&vcpu->mutex){+.+.+.}, at: [] .vcpu_load+0x28/0x90 [kvm] #1: (&(&vcore->lock)->rlock){+.+...}, at: [] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv] stack backtrace: CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131 Call Trace: [c01a66c0f310] [c0b486ac] .dump_stack+0x88/0xb4 (unreliable) [c01a66c0f390] [c00f8bec] .print_circular_bug+0x27c/0x3d0 [c01a66c0f440] [c00fe9e8] .__lock_acquire+0x2028/0x2190 [c01a66c0f5d0] [c00ff28c] .lock_acquire+0xcc/0x1a0 [c01a66c0f6a0] [c0b3c120] .mutex_lock_nested+0x80/0x570 [c01a66c0f7c0] [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv] [c01a66c0f860] [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv] [c01a66c0f8d0] [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm] [c01a66c0f960] [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm] [c01a66c0f9f0] [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm] [c01a66c0faf0] [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm] [c01a66c0fcb0] [c026cbb4] .do_vfs_ioctl+0x444/0x770 [c01a66c0fd90] [c026cfa4] .SyS_ioctl+0xc4/0xe0 [c01a66c0fe30] [c0009264] syscall_exit+0x0/0x98 This fixes it by moving the mutex_lock()/mutex_unlock() pair outside the spin-locked region. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index de4018a..b273193 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu, static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr, bool preserve_top32) { + struct kvm *kvm = vcpu->kvm; struct kvmppc_vcore *vc = vcpu->arch.vcore; u64 mask; + mutex_lock(&kvm->lock); spin_lock(&vc->lock); /* * If ILE (interrupt little-endian) has changed, update the * MSR_LE bit in the intr_msr for each vcpu in this vcore. */ if ((new_lpcr & LPCR_ILE) != (vc->lpcr & LPCR_ILE)) { - struct kvm *kvm = vcpu->kvm; struct kvm_vcpu *vcpu; int i; - mutex_lock(&kvm->lock); kvm_for_each_vcpu(i, vcpu, kvm) { if (vcpu->arch.vcore != vc) continue; @@ -964,7 +964,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr, else vcpu->arch.in
[PULL 2/3] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
From: Paul Mackerras The VPA (virtual processor area) is defined by PAPR and is therefore big-endian, so we need a be32_to_cpu when reading it in kvmppc_get_yield_count(). Without this, H_CONFER always fails on a little-endian host, causing SMP guests to waste time spinning on spinlocks. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b273193..de74756 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu) spin_lock(&vcpu->arch.vpa_update_lock); lppaca = (struct lppaca *)vcpu->arch.vpa.pinned_addr; if (lppaca) - yield_count = lppaca->yield_count; + yield_count = be32_to_cpu(lppaca->yield_count); spin_unlock(&vcpu->arch.vpa_update_lock); return yield_count; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 0/3] 4.0 patch queue 2015-03-25
Hi Paolo, This is my current patch queue for 4.0. Please pull. Alex The following changes since commit f710a12d73dfa1c3a5d2417f2482b970f03bb850: Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm (2015-03-16 20:08:56 -0300) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-for-4.0 for you to fetch changes up to 2bf27601c7b50b6ced72f27304109dc52eb52919: KVM: PPC: Book3S HV: Fix instruction emulation (2015-03-20 11:42:33 +0100) Patch queue for 4.0 - 2015-03-25 A few bug fixes for Book3S HV KVM: - Fix spinlock ordering - Fix idle guests on LE hosts - Fix instruction emulation Paul Mackerras (3): KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr() KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count KVM: PPC: Book3S HV: Fix instruction emulation arch/powerpc/kvm/book3s_hv.c| 8 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 2 files changed, 5 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-ppc:kvm-ppc-queue 7/9] ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined!
On 23.03.15 04:03, Michael Ellerman wrote: > On Mon, 2015-03-23 at 14:00 +1100, Paul Mackerras wrote: >> On Fri, Mar 20, 2015 at 08:07:53PM +0800, kbuild test robot wrote: >>> tree: git://github.com/agraf/linux-2.6.git kvm-ppc-queue >>> head: 9b1daf3cfba1801768aa41b1b6ad0b653844241f >>> commit: aba777f5ce0accb4c6a277e671de0330752954e8 [7/9] KVM: PPC: Book3S HV: >>> Convert ICS mutex lock to spin lock >>> config: powerpc-defconfig (attached as .config) >>> reproduce: >>> wget >>> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross >>> -O ~/bin/make.cross >>> chmod +x ~/bin/make.cross >>> git checkout aba777f5ce0accb4c6a277e671de0330752954e8 >>> # save the attached .config to linux build tree >>> make.cross ARCH=powerpc >>> >>> All error/warnings: >>> > ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined! >> >> Yes, this is the patch that depends on the "powerpc: Export >> __spin_yield" patch that Suresh posted to linuxppc-...@ozlabs.org and >> I acked. >> >> I think the best thing at this stage is probably for Alex to take that >> patch through his tree, assuming Michael is OK with that. > > Fine by me. > > Acked-by: Michael Ellerman Awesome, thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 23.03.15 08:50, Bharata B Rao wrote: > On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf wrote: >> >> >> On 20.03.15 16:51, Bharata B Rao wrote: >>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 20.03.15 12:26, Paul Mackerras wrote: >>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote: >>>>>> >>>>>> >>>>>> On 20.03.15 10:39, Paul Mackerras wrote: >>>>>>> From: Bharata B Rao >>>>>>> >>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from >>>>>>> userspace(QEMU) >>>>>>> correctly, certain work arounds have to be employed to allow reuse of >>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu >>>>>>> unplug >>>>>>> and reuse it later during next hotplug. >>>>>>> >>>>>>> More details can be found here: >>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>>>>>> >>>>>>> In order to support this workaround with PowerPC KVM, don't create or >>>>>>> initialize ICP if the vCPU is found to be already associated with an >>>>>>> ICP. >>>>>>> >>>>>>> Signed-off-by: Bharata B Rao >>>>>>> Signed-off-by: Paul Mackerras >>>>>> >>>>>> This probably makes some sense, but please make sure that user space has >>>>>> some way to figure out whether hotplug works at all. >>>>> >>>>> Bharata is working on the qemu side of all this, so I assume he has >>>>> that covered. >>>> >>>> Well, so far the kernel doesn't expose anything he can query, so I >>>> suppose he just blindly assumes that older host kernels will randomly >>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu >>>> can check on. >>> >>> I see that you have already taken this into your tree. I have an updated >>> patch to expose a CAP. If the below patch looks ok, then let me know how >>> you would prefer to take this patch in. >>> >>> Regards, >>> Bharata. >>> >>> KVM: PPC: BOOK3S: Allow reuse of vCPU object >>> >>> From: Bharata B Rao >>> >>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) >>> correctly, certain work arounds have to be employed to allow reuse of >>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>> proposed workaround is to park the vcpu fd in userspace during cpu unplug >>> and reuse it later during next hotplug. >>> >>> More details can be found here: >>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>> >>> In order to support this workaround with PowerPC KVM, don't create or >>> initialize ICP if the vCPU is found to be already associated with an ICP. >>> User space (QEMU) can reuse the vCPU after checking for the availability >>> of KVM_CAP_SPAPR_REUSE_VCPU capability. >>> >>> Signed-off-by: Bharata B Rao >>> --- >>> arch/powerpc/kvm/book3s_xics.c |9 +++-- >>> arch/powerpc/kvm/powerpc.c | 12 >>> include/uapi/linux/kvm.h |1 + >>> 3 files changed, 20 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c >>> index a4a8d9f..ead3a35 100644 >>> --- a/arch/powerpc/kvm/book3s_xics.c >>> +++ b/arch/powerpc/kvm/book3s_xics.c >>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, >>> struct kvm_vcpu *vcpu, >>> return -EPERM; >>> if (xics->kvm != vcpu->kvm) >>> return -EPERM; >>> - if (vcpu->arch.irq_type) >>> - return -EBUSY; >>> + >>> + /* >>> + * If irq_type is already set, don't reinialize but >>> + * return success allowing this vcpu to
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 20.03.15 16:51, Bharata B Rao wrote: > On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote: >> >> >> On 20.03.15 12:26, Paul Mackerras wrote: >>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 20.03.15 10:39, Paul Mackerras wrote: >>>>> From: Bharata B Rao >>>>> >>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) >>>>> correctly, certain work arounds have to be employed to allow reuse of >>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug >>>>> and reuse it later during next hotplug. >>>>> >>>>> More details can be found here: >>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>>>> >>>>> In order to support this workaround with PowerPC KVM, don't create or >>>>> initialize ICP if the vCPU is found to be already associated with an ICP. >>>>> >>>>> Signed-off-by: Bharata B Rao >>>>> Signed-off-by: Paul Mackerras >>>> >>>> This probably makes some sense, but please make sure that user space has >>>> some way to figure out whether hotplug works at all. >>> >>> Bharata is working on the qemu side of all this, so I assume he has >>> that covered. >> >> Well, so far the kernel doesn't expose anything he can query, so I >> suppose he just blindly assumes that older host kernels will randomly >> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu >> can check on. > > I see that you have already taken this into your tree. I have an updated > patch to expose a CAP. If the below patch looks ok, then let me know how > you would prefer to take this patch in. > > Regards, > Bharata. > > KVM: PPC: BOOK3S: Allow reuse of vCPU object > > From: Bharata B Rao > > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) > correctly, certain work arounds have to be employed to allow reuse of > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such > proposed workaround is to park the vcpu fd in userspace during cpu unplug > and reuse it later during next hotplug. > > More details can be found here: > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html > > In order to support this workaround with PowerPC KVM, don't create or > initialize ICP if the vCPU is found to be already associated with an ICP. > User space (QEMU) can reuse the vCPU after checking for the availability > of KVM_CAP_SPAPR_REUSE_VCPU capability. > > Signed-off-by: Bharata B Rao > --- > arch/powerpc/kvm/book3s_xics.c |9 +++-- > arch/powerpc/kvm/powerpc.c | 12 > include/uapi/linux/kvm.h |1 + > 3 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c > index a4a8d9f..ead3a35 100644 > --- a/arch/powerpc/kvm/book3s_xics.c > +++ b/arch/powerpc/kvm/book3s_xics.c > @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, > struct kvm_vcpu *vcpu, > return -EPERM; > if (xics->kvm != vcpu->kvm) > return -EPERM; > - if (vcpu->arch.irq_type) > - return -EBUSY; > + > + /* > + * If irq_type is already set, don't reinialize but > + * return success allowing this vcpu to be reused. > + */ > + if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT) > + return 0; > > r = kvmppc_xics_create_icp(vcpu, xcpu); > if (!r) > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 27c0fac..5b7007c 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long > ext) > r = 1; > break; > #endif > + case KVM_CAP_SPAPR_REUSE_VCPU: > + /* > + * Kernel currently doesn't support closing of vCPU fd from > + * user space (QEMU) correctly. Hence the option available > + * is to park the vCPU fd in user space whenever a guest > + * CPU is hot removed and reuse the
Re: [PATCH v4 2/4] kvm/ppc/mpic: drop unused IRQ_testbit
On 21.03.15 07:56, Arseny Solokha wrote: > Drop unused static procedure which doesn't have callers within its > translation unit. It had been already removed independently in QEMU[1] > from the OpenPIC implementation borrowed by the kernel. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html > > v4: Fixed the comment regarding the origination of OpenPIC codebase > and CC'ed KVM mailing lists, as suggested by Alexander Graf. > > v3: In patch 4/4, do not remove fsl_mpic_primary_get_version() from > arch/powerpc/sysdev/mpic.c because the patch by Jia Hongtao > ("powerpc/85xx: workaround for chips with MSI hardware errata") makes > use of it. > > v2: Added a brief explanation to each patch description of why removed > functions are unused, as suggested by Michael Ellerman. > > Signed-off-by: Arseny Solokha Thanks, applied to kvm-ppc-queue (for 4.1). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
On 20.03.15 10:39, Paul Mackerras wrote: > This is my current patch queue for HV KVM on PPC. This series is > based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3 > plus a set of recent KVM changes which don't intersect with the > changes in this series. On top of that, in my testing I have some > patches which are not KVM-related but are needed to boot and run a > recent upstream kernel successfully: > > tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop > tick/hotplug: Handover time related duties before cpu offline > powerpc/powernv: Check image loaded or not before calling flash > powerpc/powernv: Fixes for hypervisor doorbell handling > powerpc/powernv: Fix return value from power7_nap() et al. > powerpc: Export __spin_yield > > These patches have been posted by their authors and are on their way > upstream via various trees. They are not included in this series. > > The first three patches are bug fixes that should go into v4.0 if > possible. The remainder are intended for the 4.1 merge window. > > The patch "powerpc: Export __spin_yield" is a prerequisite for patch > 9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to > spin lock"). It is on its way upstream through the linuxppc-dev > mailing list. > > The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is > needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV: > Use msgsnd for signalling threads". It is also on its way upstream > through the linuxppc-dev list. I am expecting both of these > prerequisite patches to go into 4.0. > > Finally, the last patch in this series converts some of the assembly > code in book3s_hv_rmhandlers.S into C. I intend to continue this > trend. Thanks, applied patches 4-11 to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
On 20.03.15 12:25, Paul Mackerras wrote: > On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote: >> >> >> On 20.03.15 10:39, Paul Mackerras wrote: >>> This reads the timebase at various points in the real-mode guest >>> entry/exit code and uses that to accumulate total, minimum and >>> maximum time spent in those parts of the code. Currently these >>> times are accumulated per vcpu in 5 parts of the code: >>> >>> * rm_entry - time taken from the start of kvmppc_hv_entry() until >>> just before entering the guest. >>> * rm_intr - time from when we take a hypervisor interrupt in the >>> guest until we either re-enter the guest or decide to exit to the >>> host. This includes time spent handling hcalls in real mode. >>> * rm_exit - time from when we decide to exit the guest until the >>> return from kvmppc_hv_entry(). >>> * guest - time spend in the guest >>> * cede - time spent napping in real mode due to an H_CEDE hcall >>> while other threads in the same vcore are active. >>> >>> These times are exposed in debugfs in a directory per vcpu that >>> contains a file called "timings". This file contains one line for >>> each of the 5 timings above, with the name followed by a colon and >>> 4 numbers, which are the count (number of times the code has been >>> executed), the total time, the minimum time, and the maximum time, >>> all in nanoseconds. >>> >>> Signed-off-by: Paul Mackerras >> >> Have you measure the additional overhead this brings? > > I haven't - in fact I did this patch so I could measure the overhead > or improvement from other changes I did, but it doesn't measure its > own overhead, of course. I guess I need a workload that does a > defined number of guest entries and exits and measure how fast it runs > with and without the patch (maybe something like H_SET_MODE in a > loop). I'll figure something out and post the results. Yeah, just measure the number of exits you can handle for a simple hcall. If there is measurable overhead, it's probably a good idea to move the statistics gathering into #ifdef paths for DEBUGFS or maybe even a separate EXIT_TIMING config option as we have it for booke. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 20.03.15 12:26, Paul Mackerras wrote: > On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote: >> >> >> On 20.03.15 10:39, Paul Mackerras wrote: >>> From: Bharata B Rao >>> >>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) >>> correctly, certain work arounds have to be employed to allow reuse of >>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>> proposed workaround is to park the vcpu fd in userspace during cpu unplug >>> and reuse it later during next hotplug. >>> >>> More details can be found here: >>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>> >>> In order to support this workaround with PowerPC KVM, don't create or >>> initialize ICP if the vCPU is found to be already associated with an ICP. >>> >>> Signed-off-by: Bharata B Rao >>> Signed-off-by: Paul Mackerras >> >> This probably makes some sense, but please make sure that user space has >> some way to figure out whether hotplug works at all. > > Bharata is working on the qemu side of all this, so I assume he has > that covered. Well, so far the kernel doesn't expose anything he can query, so I suppose he just blindly assumes that older host kernels will randomly break and nobody cares. I'd rather prefer to see a CAP exposed that qemu can check on. > >> Also Paul, for patches that you pick up from others, I'd prefer if they >> send the patches to the ML themselves first and you pick them up from >> there then. That way we give everyone the same treatment. > > Fair enough. In fact Bharata did post the patch but he sent it to > linuxppc-...@ozlabs.org not the KVM lists. Please make sure you only take patches into your queue that made it to at least kvm@vger, preferably kvm-ppc@vger as well. If you see related patches on other mailing lists, just ask the respective people to resend with proper ML exposure. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
On 20.03.15 10:39, Paul Mackerras wrote: > This uses msgsnd where possible for signalling other threads within > the same core on POWER8 systems, rather than IPIs through the XICS > interrupt controller. This includes waking secondary threads to run > the guest, the interrupts generated by the virtual XICS, and the > interrupts to bring the other threads out of the guest when exiting. > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/kernel/asm-offsets.c | 4 +++ > arch/powerpc/kvm/book3s_hv.c| 48 > ++--- > arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 > 4 files changed, 83 insertions(+), 21 deletions(-) > > diff --git a/arch/powerpc/kernel/asm-offsets.c > b/arch/powerpc/kernel/asm-offsets.c > index fa7b57d..0ce2aa6 100644 > --- a/arch/powerpc/kernel/asm-offsets.c > +++ b/arch/powerpc/kernel/asm-offsets.c > @@ -37,6 +37,7 @@ > #include > #include > #include > +#include > #ifdef CONFIG_PPC64 > #include > #include > @@ -568,6 +569,7 @@ int main(void) > DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr)); > DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr)); > DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes)); > + DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu)); > DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige)); > DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv)); > DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb)); > @@ -757,5 +759,7 @@ int main(void) > offsetof(struct paca_struct, subcore_sibling_mask)); > #endif > > + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER); > + > return 0; > } > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 03a8bb4..2c34bae 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -51,6 +51,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, > MAX_HCALL_OPCODE/4 + 1); > static void kvmppc_end_cede(struct kvm_vcpu *vcpu); > static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); > > +static bool kvmppc_ipi_thread(int cpu) > +{ > + /* On POWER8 for IPIs to threads in the same core, use msgsnd */ > + if (cpu_has_feature(CPU_FTR_ARCH_207S)) { > + preempt_disable(); > + if ((cpu & ~7) == (smp_processor_id() & ~7)) { > + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); > + msg |= cpu & 7; > + smp_mb(); > + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg)); > + preempt_enable(); > + return true; > + } > + preempt_enable(); > + } > + > +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) > + if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) { > + xics_wake_cpu(cpu); > + return true; > + } > +#endif > + > + return false; > +} > + > static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) > { > - int me; > int cpu = vcpu->cpu; > wait_queue_head_t *wqp; > > @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu > *vcpu) > ++vcpu->stat.halt_wakeup; > } > > - me = get_cpu(); > + if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid)) > + return; > > /* CPU points to the first thread of the core */ > - if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) { > -#ifdef CONFIG_PPC_ICP_NATIVE > - int real_cpu = cpu + vcpu->arch.ptid; > - if (paca[real_cpu].kvm_hstate.xics_phys) > - xics_wake_cpu(real_cpu); > - else > -#endif > - if (cpu_online(cpu)) > - smp_send_reschedule(cpu); > - } > - put_cpu(); > + if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu)) > + smp_send_reschedule(cpu); > } > > /* > @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) > /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */ > smp_wmb(); > tpaca->kvm_hstate.kvm_vcpu = vcpu; > -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) > if (cpu != smp_processor_id()) > - xics_wake_cpu(cpu); > -#endif > + kvmppc_ipi_thread(cpu); > } > > static void kvmppc_wait_for_nap(void) > diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c > b/arch/powerpc/kvm/book3s_hv_rm_xics.c > index 6dded8c..457a8b1 100644 > --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c > +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include "book3s_xics.h" > > @@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vc
Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
On 20.03.15 10:39, Paul Mackerras wrote: > This creates a debugfs directory for each HV guest (assuming debugfs > is enabled in the kernel config), and within that directory, a file > by which the contents of the guest's HPT (hashed page table) can be > read. The directory is named vm, where is the PID of the > process that created the guest. The file is named "htab". This is > intended to help in debugging problems in the host's management > of guest memory. > > The contents of the file consist of a series of lines like this: > > 3f48 4000d032bf003505 000bd7ff1196 0003b5c71196 > > The first field is the index of the entry in the HPT, the second and > third are the HPT entry, so the third entry contains the real page > number that is mapped by the entry if the entry's valid bit is set. > The fourth field is the guest's view of the second doubleword of the > entry, so it contains the guest physical address. (The format of the > second through fourth fields are described in the Power ISA and also > in arch/powerpc/include/asm/mmu-hash64.h.) > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/include/asm/kvm_book3s_64.h | 2 + > arch/powerpc/include/asm/kvm_host.h | 2 + > arch/powerpc/kvm/book3s_64_mmu_hv.c | 136 > +++ > arch/powerpc/kvm/book3s_hv.c | 12 +++ > virt/kvm/kvm_main.c | 1 + > 5 files changed, 153 insertions(+) > > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h > b/arch/powerpc/include/asm/kvm_book3s_64.h > index 0789a0f..869c53f 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -436,6 +436,8 @@ static inline struct kvm_memslots > *kvm_memslots_raw(struct kvm *kvm) > return rcu_dereference_raw_notrace(kvm->memslots); > } > > +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); > + > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > > #endif /* __ASM_KVM_BOOK3S_64_H__ */ > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 015773f..f1d0bbc 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -238,6 +238,8 @@ struct kvm_arch { > atomic_t hpte_mod_interest; > cpumask_t need_tlb_flush; > int hpt_cma_alloc; > + struct dentry *debugfs_dir; > + struct dentry *htab_dentry; > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE > struct mutex hpt_mutex; > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 6c6825a..d6fe308 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct > kvm_get_htab_fd *ghf) > return ret; > } > > +struct debugfs_htab_state { > + struct kvm *kvm; > + struct mutexmutex; > + unsigned long hpt_index; > + int chars_left; > + int buf_index; > + charbuf[64]; > +}; > + > +static int debugfs_htab_open(struct inode *inode, struct file *file) > +{ > + struct kvm *kvm = inode->i_private; > + struct debugfs_htab_state *p; > + > + p = kzalloc(sizeof(*p), GFP_KERNEL); > + if (!p) > + return -ENOMEM; > + > + kvm_get_kvm(kvm); > + p->kvm = kvm; > + mutex_init(&p->mutex); > + file->private_data = p; > + > + return nonseekable_open(inode, file); > +} > + > +static int debugfs_htab_release(struct inode *inode, struct file *file) > +{ > + struct debugfs_htab_state *p = file->private_data; > + > + kvm_put_kvm(p->kvm); > + kfree(p); > + return 0; > +} > + > +static ssize_t debugfs_htab_read(struct file *file, char __user *buf, > + size_t len, loff_t *ppos) > +{ > + struct debugfs_htab_state *p = file->private_data; > + ssize_t ret, r; > + unsigned long i, n; > + unsigned long v, hr, gr; > + struct kvm *kvm; > + __be64 *hptp; > + > + ret = mutex_lock_interruptible(&p->mutex); > + if (ret) > + return ret; > + > + if (p->chars_left) { > + n = p->chars_left; > + if (n > len) > + n = len; > + r = copy_to_user(buf, p->buf + p->buf_index, n); > + n -= r; > + p->chars_left -= n; > + p->buf_index += n; > + buf += n; > + len -= n; > + ret = n; > + if (r) { > + if (!n) > + ret = -EFAULT; > + goto out; > + } > + } > + > + kvm = p->kvm; > + i = p->hpt_index; > + hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE)); > + for (; len != 0 && i < kvm->arch.
Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
On 20.03.15 10:39, Paul Mackerras wrote: > This reads the timebase at various points in the real-mode guest > entry/exit code and uses that to accumulate total, minimum and > maximum time spent in those parts of the code. Currently these > times are accumulated per vcpu in 5 parts of the code: > > * rm_entry - time taken from the start of kvmppc_hv_entry() until > just before entering the guest. > * rm_intr - time from when we take a hypervisor interrupt in the > guest until we either re-enter the guest or decide to exit to the > host. This includes time spent handling hcalls in real mode. > * rm_exit - time from when we decide to exit the guest until the > return from kvmppc_hv_entry(). > * guest - time spend in the guest > * cede - time spent napping in real mode due to an H_CEDE hcall > while other threads in the same vcore are active. > > These times are exposed in debugfs in a directory per vcpu that > contains a file called "timings". This file contains one line for > each of the 5 timings above, with the name followed by a colon and > 4 numbers, which are the count (number of times the code has been > executed), the total time, the minimum time, and the maximum time, > all in nanoseconds. > > Signed-off-by: Paul Mackerras Have you measure the additional overhead this brings? > --- > arch/powerpc/include/asm/kvm_host.h | 19 + > arch/powerpc/include/asm/time.h | 3 + > arch/powerpc/kernel/asm-offsets.c | 11 +++ > arch/powerpc/kernel/time.c | 6 ++ > arch/powerpc/kvm/book3s_hv.c| 135 > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 - > 6 files changed, 276 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index f1d0bbc..286c0ce 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -369,6 +369,14 @@ struct kvmppc_slb { > u8 base_page_size; /* MMU_PAGE_xxx */ > }; > > +/* Struct used to accumulate timing information in HV real mode code */ > +struct kvmhv_tb_accumulator { > + u64 seqcount; /* used to synchronize access, also count * 2 */ > + u64 tb_total; /* total time in timebase ticks */ > + u64 tb_min; /* min time */ > + u64 tb_max; /* max time */ > +}; > + > # ifdef CONFIG_PPC_FSL_BOOK3E > #define KVMPPC_BOOKE_IAC_NUM 2 > #define KVMPPC_BOOKE_DAC_NUM 2 > @@ -656,6 +664,17 @@ struct kvm_vcpu_arch { > u64 busy_preempt; > > u32 emul_inst; > + > + struct kvmhv_tb_accumulator *cur_activity; /* What we're timing */ > + u64 cur_tb_start; /* when it started */ > + struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ > + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ > + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ > + struct kvmhv_tb_accumulator guest_time; /* guest execution */ > + struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ > + > + struct dentry *debugfs_dir; > + struct dentry *debugfs_timings; > #endif > }; > > diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h > index 03cbada..10fc784 100644 > --- a/arch/powerpc/include/asm/time.h > +++ b/arch/powerpc/include/asm/time.h > @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void); > > DECLARE_PER_CPU(u64, decrementers_next_tb); > > +/* Convert timebase ticks to nanoseconds */ > +unsigned long long tb_to_ns(unsigned long long tb_ticks); > + > #endif /* __KERNEL__ */ > #endif /* __POWERPC_TIME_H */ > diff --git a/arch/powerpc/kernel/asm-offsets.c > b/arch/powerpc/kernel/asm-offsets.c > index 4717859..ec9f59c 100644 > --- a/arch/powerpc/kernel/asm-offsets.c > +++ b/arch/powerpc/kernel/asm-offsets.c > @@ -458,6 +458,17 @@ int main(void) > DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1)); > DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2)); > DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3)); > + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry)); > + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr)); > + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit)); > + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time)); > + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time)); > + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity)); > + DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, > arch.cur_tb_start)); > + DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount)); > + DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total)); > + DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min)); > + DEFINE(TAS_MAX
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 20.03.15 10:39, Paul Mackerras wrote: > From: Bharata B Rao > > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) > correctly, certain work arounds have to be employed to allow reuse of > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such > proposed workaround is to park the vcpu fd in userspace during cpu unplug > and reuse it later during next hotplug. > > More details can be found here: > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html > > In order to support this workaround with PowerPC KVM, don't create or > initialize ICP if the vCPU is found to be already associated with an ICP. > > Signed-off-by: Bharata B Rao > Signed-off-by: Paul Mackerras This probably makes some sense, but please make sure that user space has some way to figure out whether hotplug works at all. Also Paul, for patches that you pick up from others, I'd prefer if they send the patches to the ML themselves first and you pick them up from there then. That way we give everyone the same treatment. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
On 20.03.15 10:39, Paul Mackerras wrote: > This is my current patch queue for HV KVM on PPC. This series is > based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3 > plus a set of recent KVM changes which don't intersect with the > changes in this series. On top of that, in my testing I have some > patches which are not KVM-related but are needed to boot and run a > recent upstream kernel successfully: > > tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop > tick/hotplug: Handover time related duties before cpu offline > powerpc/powernv: Check image loaded or not before calling flash > powerpc/powernv: Fixes for hypervisor doorbell handling > powerpc/powernv: Fix return value from power7_nap() et al. > powerpc: Export __spin_yield > > These patches have been posted by their authors and are on their way > upstream via various trees. They are not included in this series. > > The first three patches are bug fixes that should go into v4.0 if > possible. Thanks, applied the first 3 to my for-4.0 branch which is going through autotest now. If everything runs fine, I'll send it to Paolo for upstream merge. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
On 16.03.15 21:41, David Gibson wrote: > On Thu, Feb 05, 2015 at 01:57:11AM +0100, Alexander Graf wrote: >> >> >> On 05.02.15 01:53, David Gibson wrote: >>> On POWER, storage caching is usually configured via the MMU - attributes >>> such as cache-inhibited are stored in the TLB and the hashed page table. >>> >>> This makes correctly performing cache inhibited IO accesses awkward when >>> the MMU is turned off (real mode). Some CPU models provide special >>> registers to control the cache attributes of real mode load and stores but >>> this is not at all consistent. This is a problem in particular for SLOF, >>> the firmware used on KVM guests, which runs entirely in real mode, but >>> which needs to do IO to load the kernel. >>> >>> To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD >>> and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to >>> a logical address (aka guest physical address). SLOF uses these for IO. >>> >>> However, because these are implemented within qemu, not the host kernel, >>> these bypass any IO devices emulated within KVM itself. The simplest way >>> to see this problem is to attempt to boot a KVM guest from a virtio-blk >>> device with iothread / dataplane enabled. The iothread code relies on an >>> in kernel implementation of the virtio queue notification, which is not >>> triggered by the IO hcalls, and so the guest will stall in SLOF unable to >>> load the guest OS. >>> >>> This patch addresses this by providing in-kernel implementations of the >>> 2 hypercalls, which correctly scan the KVM IO bus. Any access to an >>> address not handled by the KVM IO bus will cause a VM exit, hitting the >>> qemu implementation as before. >>> >>> Note that a userspace change is also required, in order to enable these >>> new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. >>> >>> Signed-off-by: David Gibson >> >> Thanks, applied to kvm-ppc-queue. > > Any news on when this might go up to mainline? I'm aiming for 4.1. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: H_CLEAR_REF and H_CLEAR_MOD
> Am 18.02.2015 um 07:12 schrieb Nathan Whitehorn : > > It seems like KVM doesn't implement the H_CLEAR_REF and H_CLEAR_MOD > hypervisor calls, which are absolutely critical for memory management in the > FreeBSD kernel (and are marked "mandatory" in the PAPR manual). It seems some > patches have been contributed already in > https://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/095013.html, so > it would be fantastic if these could end up upstream. Paul, I guess we never included this because there was no user. If FreeBSD does use it though, I think it makes a lot of sense to resend it for inclusion. > > I'm going to try to get some kind of workaround in the meantime so we can at > least run on existing kernels. Please don't add hacks in FreeBSD only because kvm is missing a feature. Let's just get this done properly :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
On 05.02.15 01:53, David Gibson wrote: > On POWER, storage caching is usually configured via the MMU - attributes > such as cache-inhibited are stored in the TLB and the hashed page table. > > This makes correctly performing cache inhibited IO accesses awkward when > the MMU is turned off (real mode). Some CPU models provide special > registers to control the cache attributes of real mode load and stores but > this is not at all consistent. This is a problem in particular for SLOF, > the firmware used on KVM guests, which runs entirely in real mode, but > which needs to do IO to load the kernel. > > To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD > and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to > a logical address (aka guest physical address). SLOF uses these for IO. > > However, because these are implemented within qemu, not the host kernel, > these bypass any IO devices emulated within KVM itself. The simplest way > to see this problem is to attempt to boot a KVM guest from a virtio-blk > device with iothread / dataplane enabled. The iothread code relies on an > in kernel implementation of the virtio queue notification, which is not > triggered by the IO hcalls, and so the guest will stall in SLOF unable to > load the guest OS. > > This patch addresses this by providing in-kernel implementations of the > 2 hypercalls, which correctly scan the KVM IO bus. Any access to an > address not handled by the KVM IO bus will cause a VM exit, hitting the > qemu implementation as before. > > Note that a userspace change is also required, in order to enable these > new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. > > Signed-off-by: David Gibson Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
On 03.02.15 06:44, David Gibson wrote: > On POWER, storage caching is usually configured via the MMU - attributes > such as cache-inhibited are stored in the TLB and the hashed page table. > > This makes correctly performing cache inhibited IO accesses awkward when > the MMU is turned off (real mode). Some CPU models provide special > registers to control the cache attributes of real mode load and stores but > this is not at all consistent. This is a problem in particular for SLOF, > the firmware used on KVM guests, which runs entirely in real mode, but > which needs to do IO to load the kernel. > > To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD > and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to > a logical address (aka guest physical address). SLOF uses these for IO. > > However, because these are implemented within qemu, not the host kernel, > these bypass any IO devices emulated within KVM itself. The simplest way > to see this problem is to attempt to boot a KVM guest from a virtio-blk > device with iothread / dataplane enabled. The iothread code relies on an > in kernel implementation of the virtio queue notification, which is not > triggered by the IO hcalls, and so the guest will stall in SLOF unable to > load the guest OS. > > This patch addresses this by providing in-kernel implementations of the > 2 hypercalls, which correctly scan the KVM IO bus. Any access to an > address not handled by the KVM IO bus will cause a VM exit, hitting the > qemu implementation as before. > > Note that a userspace change is also required, in order to enable these > new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. > > Signed-off-by: David Gibson > --- > arch/powerpc/include/asm/kvm_book3s.h | 3 ++ > arch/powerpc/kvm/book3s.c | 76 > +++ > arch/powerpc/kvm/book3s_hv.c | 12 ++ > arch/powerpc/kvm/book3s_pr_papr.c | 28 + > 4 files changed, 119 insertions(+) > > v2: > - Removed some debugging printk()s that were accidentally left in > - Fix endianness; like all PAPR hypercalls, these should always act > big-endian, even if the guest is little-endian (in practice this > makes no difference, since the only user is SLOF, which is always > big-endian) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index 942c7b1..578e550 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -292,6 +292,9 @@ static inline bool kvmppc_supports_magic_page(struct > kvm_vcpu *vcpu) > return !is_kvmppc_hv_enabled(vcpu->kvm); > } > > +extern int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu); > +extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu); > + > /* Magic register values loaded into r3 and r4 before the 'sc' assembly > * instruction for the OSI hypercalls */ > #define OSI_SC_MAGIC_R3 0x113724FA > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c > index 888bf46..7b51492 100644 > --- a/arch/powerpc/kvm/book3s.c > +++ b/arch/powerpc/kvm/book3s.c > @@ -820,6 +820,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm) > #endif > } > > +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu) > +{ > + unsigned long size = kvmppc_get_gpr(vcpu, 4); > + unsigned long addr = kvmppc_get_gpr(vcpu, 5); > + u64 buf; > + int ret; > + > + if (!is_power_of_2(size) || (size > sizeof(buf))) > + return H_TOO_HARD; > + > + ret = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, size, &buf); > + if (ret != 0) > + return H_TOO_HARD; > + > + switch (size) { > + case 1: > + kvmppc_set_gpr(vcpu, 4, *(u8 *)&buf); > + break; > + > + case 2: > + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(u16 *)&buf)); > + break; > + > + case 4: > + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(u32 *)&buf)); > + break; > + > + case 8: > + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(u64 *)&buf)); Shouldn't these casts be __be types? > + break; > + > + default: > + BUG(); > + } > + > + return H_SUCCESS; > +} > +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load); /* For use by the kvm-pr module > */ No need for the comment. > + > +int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu) > +{ > + unsigned long size = kvmppc_get_gpr(vcpu, 4); > + unsigned long addr = kvmppc_get_gpr(vcpu, 5); > + unsigned long val = kvmppc_get_gpr(vcpu, 6); > + u64 buf; > + int ret; > + > + switch (size) { > + case 1: > + *(u8 *)&buf = val; > + break; > + > + case 2: > + *(u16 *)&buf = cpu_to_be16(val); > + break; > + > + case 4: > + *(u32 *)&buf = cpu_to_be32(val); > + break; > + > + case 8: > + *(u64 *
Re: [PATCH 0/8] current ACCESS_ONCE patch queue
On 15.01.15 09:58, Christian Borntraeger wrote: > Folks, > > fyi, this is my current patch queue for the next merge window. It > does contain a patch that will disallow ACCESS_ONCE on non-scalar > types. > > The tree is part of linux-next and can be found at > git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git linux-next KVM PPC bits are: Acked-by: Alexander Graf Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc: powernv: Return to cpu offline loop when finished in KVM guest
On 21.12.14 15:13, Andreas Schwab wrote: > arch/powerpc/kvm/built-in.o: In function `kvm_no_guest': > arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x724): undefined reference to > `power7_wakeup_loss' Ugh. We just removed support for 970 HV mode, but that obviously doesn't mean you can't compile in support for HV mode without enabling p7. Paul, what would you think of a patch that makes BOOK3S_HV depend on PPC_POWERNV? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 16/18] KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
From: Paul Mackerras There are two ways in which a guest instruction can be obtained from the guest in the guest exit code in book3s_hv_rmhandlers.S. If the exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal instruction), the offending instruction is in the HEIR register (Hypervisor Emulation Instruction Register). If the exit was caused by a load or store to an emulated MMIO device, we load the instruction from the guest by turning data relocation on and loading the instruction with an lwz instruction. Unfortunately, in the case where the guest has opposite endianness to the host, these two methods give results of different endianness, but both get put into vcpu->arch.last_inst. The HEIR value has been loaded using guest endianness, whereas the lwz will load the instruction using host endianness. The rest of the code that uses vcpu->arch.last_inst assumes it was loaded using host endianness. To fix this, we define a new vcpu field to store the HEIR value. Then, in kvmppc_handle_exit_hv(), we transfer the value from this new field to vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness differ. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv.c| 4 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 ++-- 4 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5686a42..6544187 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -651,6 +651,8 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; + + u32 emul_inst; #endif }; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 815212e..b14716b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -498,6 +498,7 @@ int main(void) DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); + DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 1ee4e9e..299351e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -831,6 +831,10 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, * Accordingly return to Guest or Host. */ case BOOK3S_INTERRUPT_H_EMUL_ASSIST: + if (vcpu->arch.emul_inst != KVM_INST_FETCH_FAILED) + vcpu->arch.last_inst = kvmppc_need_byteswap(vcpu) ? + swab32(vcpu->arch.emul_inst) : + vcpu->arch.emul_inst; if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) { r = kvmppc_emulate_debug_inst(run, vcpu); } else { diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index c0f9e68..26a5b8d 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -983,13 +983,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) stw r12,VCPU_TRAP(r9) - /* Save HEIR (HV emulation assist reg) in last_inst + /* Save HEIR (HV emulation assist reg) in emul_inst if this is an HEI (HV emulation interrupt, e40) */ li r3,KVM_INST_FETCH_FAILED cmpwi r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST bne 11f mfspr r3,SPRN_HEIR -11:stw r3,VCPU_LAST_INST(r9) +11:stw r3,VCPU_HEIR(r9) /* these are volatile across C function calls */ mfctr r3 -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 14/18] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
From: "Suresh E. Warrier" This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. Acked-by: Paul Mackerras Signed-off-by: Suresh Warrier Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 13 +- arch/powerpc/kvm/book3s_hv.c| 19 ++ arch/powerpc/kvm/trace_book3s.h | 32 +++ arch/powerpc/kvm/trace_hv.h | 477 arch/powerpc/kvm/trace_pr.h | 25 +- 5 files changed, 539 insertions(+), 27 deletions(-) create mode 100644 arch/powerpc/kvm/trace_book3s.h create mode 100644 arch/powerpc/kvm/trace_hv.h diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 59425f1..311e4a3 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -37,6 +37,8 @@ #include #include +#include "trace_hv.h" + /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ #define MAX_LPID_970 63 @@ -622,6 +624,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, gfn = gpa >> PAGE_SHIFT; memslot = gfn_to_memslot(kvm, gfn); + trace_kvm_page_fault_enter(vcpu, hpte, memslot, ea, dsisr); + /* No memslot means it's an emulated MMIO region */ if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, @@ -641,6 +645,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, mmu_seq = kvm->mmu_notifier_seq; smp_rmb(); + ret = -EFAULT; is_io = 0; pfn = 0; page = NULL; @@ -664,7 +669,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } up_read(¤t->mm->mmap_sem); if (!pfn) - return -EFAULT; + goto out_put; } else { page = pages[0]; pfn = page_to_pfn(page); @@ -694,14 +699,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } } - ret = -EFAULT; if (psize > pte_size) goto out_put; /* Check WIMG vs. the actual page we're accessing */ if (!hpte_cache_flags_ok(r, is_io)) { if (is_io) - return -EFAULT; + goto out_put; + /* * Allow guest to map emulated device memory as * uncacheable, but actually make it cacheable. @@ -765,6 +770,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, SetPageDirty(page); out_put: + trace_kvm_page_fault_exit(vcpu, hpte, ret); + if (page) { /* * We drop pages[0] here, not page because page might diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 74afa2d..325ed94 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -58,6 +58,9 @@ #include "book3s.h" +#define CREATE_TRACE_POINTS +#include "trace_hv.h" + /* #define EXIT_DEBUG */ /* #define EXIT_DEBUG_SIMPLE */ /* #define EXIT_DEBUG_INT */ @@ -1730,6 +1733,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { kvmppc_start_thread(vcpu); kvmppc_create_dtl_entry(vcpu, vc); + trace_kvm_guest_enter(vcpu); } /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1738,6 +1742,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc->vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(&vc->lock); kvm_guest_enter(); @@ -1783,6 +1790,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) kvmppc_core_pending_dec(vcpu)) kvmppc_core_dequeue_dec(vcpu); + trace_kvm_guest_exit(vcpu); + ret = RESUME_GUEST; if (vcpu->arch.trap) ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu, @@ -1808,6 +1817,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) wake_up(&vcpu->arch.cpu_run); } } + + trace_kvmppc_run_core(vc, 1); } /* @@ -1854,11 +1865,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) } vc->vcore_state = VCORE_SLEEPING; + trace_kvmppc_vcore_blocked(vc, 0); spin_unlock(&vc->lock); schedule(); finish_wait(&v
[PULL 05/18] KVM: PPC: Book3S HV: Fix KSM memory corruption
From: Paul Mackerras Testing with KSM active in the host showed occasional corruption of guest memory. Typically a page that should have contained zeroes would contain values that look like the contents of a user process stack (values such as 0x_3fff__xxx). Code inspection in kvmppc_h_protect revealed that there was a race condition with the possibility of granting write access to a page which is read-only in the host page tables. The code attempts to keep the host mapping read-only if the host userspace PTE is read-only, but if that PTE had been temporarily made invalid for any reason, the read-only check would not trigger and the host HPTE could end up read-write. Examination of the guest HPT in the failure situation revealed that there were indeed shared pages which should have been read-only that were mapped read-write. To close this race, we don't let a page go from being read-only to being read-write, as far as the real HPTE mapping the page is concerned (the guest view can go to read-write, but the actual mapping stays read-only). When the guest tries to write to the page, we take an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a writable HPTE for the page. This eliminates the occasional corruption of shared pages that was previously seen with KSM active. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 44 ++--- 1 file changed, 17 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 084ad54..411720f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -667,40 +667,30 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, rev->guest_rpte = r; note_hpte_modification(kvm, rev); } - r = (be64_to_cpu(hpte[1]) & ~mask) | bits; /* Update HPTE */ if (v & HPTE_V_VALID) { - rb = compute_tlbie_rb(v, r, pte_index); - hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID); - do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true); /* -* If the host has this page as readonly but the guest -* wants to make it read/write, reduce the permissions. -* Checking the host permissions involves finding the -* memslot and then the Linux PTE for the page. +* If the page is valid, don't let it transition from +* readonly to writable. If it should be writable, we'll +* take a trap and let the page fault code sort it out. */ - if (hpte_is_writable(r) && kvm->arch.using_mmu_notifiers) { - unsigned long psize, gfn, hva; - struct kvm_memory_slot *memslot; - pgd_t *pgdir = vcpu->arch.pgdir; - pte_t pte; - - psize = hpte_page_size(v, r); - gfn = ((r & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; - memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); - if (memslot) { - hva = __gfn_to_hva_memslot(memslot, gfn); - pte = lookup_linux_pte_and_update(pgdir, hva, - 1, &psize); - if (pte_present(pte) && !pte_write(pte)) - r = hpte_make_readonly(r); - } + pte = be64_to_cpu(hpte[1]); + r = (pte & ~mask) | bits; + if (hpte_is_writable(r) && kvm->arch.using_mmu_notifiers && + !hpte_is_writable(pte)) + r = hpte_make_readonly(r); + /* If the PTE is changing, invalidate it first */ + if (r != pte) { + rb = compute_tlbie_rb(v, r, pte_index); + hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) | + HPTE_V_ABSENT); + do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), + true); + hpte[1] = cpu_to_be64(r); } } - hpte[1] = cpu_to_be64(r); - eieio(); - hpte[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK); + unlock_hpte(hpte, v & ~HPTE_V_HVLOCK); asm volatile("ptesync" : : : "memory"); return H_SUCCESS; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 17/18] KVM: PPC: Book3S HV: Improve H_CONFER implementation
From: Sam Bobroff Currently the H_CONFER hcall is implemented in kernel virtual mode, meaning that whenever a guest thread does an H_CONFER, all the threads in that virtual core have to exit the guest. This is bad for performance because it interrupts the other threads even if they are doing useful work. The H_CONFER hcall is called by a guest VCPU when it is spinning on a spinlock and it detects that the spinlock is held by a guest VCPU that is currently not running on a physical CPU. The idea is to give this VCPU's time slice to the holder VCPU so that it can make progress towards releasing the lock. To avoid having the other threads exit the guest unnecessarily, we add a real-mode implementation of H_CONFER that checks whether the other threads are doing anything. If all the other threads are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER), it returns H_TOO_HARD which causes a guest exit and allows the H_CONFER to be handled in virtual mode. Otherwise it spins for a short time (up to 10 microseconds) to give other threads the chance to observe that this thread is trying to confer. The spin loop also terminates when any thread exits the guest or when all other threads are idle or trying to confer. If the timeout is reached, the H_CONFER returns H_SUCCESS. In this case the guest VCPU will recheck the spinlock word and most likely call H_CONFER again. This also improves the implementation of the H_CONFER virtual mode handler. If the VCPU is part of a virtual core (vcore) which is runnable, there will be a 'runner' VCPU which has taken responsibility for running the vcore. In this case we yield to the runner VCPU rather than the target VCPU. We also introduce a check on the target VCPU's yield count: if it differs from the yield count passed to H_CONFER, the target VCPU has run since H_CONFER was called and may have already released the lock. This check is required by PAPR. Signed-off-by: Sam Bobroff Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s_hv.c| 41 - arch/powerpc/kvm/book3s_hv_builtin.c| 32 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- 4 files changed, 74 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 6544187..7efd666a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -295,6 +295,7 @@ struct kvmppc_vcore { ulong dpdes;/* doorbell state (POWER8) */ void *mpp_buffer; /* Micro Partition Prefetch buffer */ bool mpp_buffer_is_valid; + ulong conferring_threads; }; #define VCORE_ENTRY_COUNT(vc) ((vc)->entry_exit_count & 0xff) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 299351e..de4018a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -607,10 +607,45 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, } } +static int kvm_arch_vcpu_yield_to(struct kvm_vcpu *target) +{ + struct kvmppc_vcore *vcore = target->arch.vcore; + + /* +* We expect to have been called by the real mode handler +* (kvmppc_rm_h_confer()) which would have directly returned +* H_SUCCESS if the source vcore wasn't idle (e.g. if it may +* have useful work to do and should not confer) so we don't +* recheck that here. +*/ + + spin_lock(&vcore->lock); + if (target->arch.state == KVMPPC_VCPU_RUNNABLE && + vcore->vcore_state != VCORE_INACTIVE) + target = vcore->runner; + spin_unlock(&vcore->lock); + + return kvm_vcpu_yield_to(target); +} + +static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu) +{ + int yield_count = 0; + struct lppaca *lppaca; + + spin_lock(&vcpu->arch.vpa_update_lock); + lppaca = (struct lppaca *)vcpu->arch.vpa.pinned_addr; + if (lppaca) + yield_count = lppaca->yield_count; + spin_unlock(&vcpu->arch.vpa_update_lock); + return yield_count; +} + int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) { unsigned long req = kvmppc_get_gpr(vcpu, 3); unsigned long target, ret = H_SUCCESS; + int yield_count; struct kvm_vcpu *tvcpu; int idx, rc; @@ -646,7 +681,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) ret = H_PARAMETER; break; } - kvm_vcpu_yield_to(tvcpu); + yield_count = kvmppc_get_gpr(vcpu, 5); + if (kvmppc_get_yield_count(tvcpu) != yield_count) + break; + kvm_arch_vcpu_yield_to(tvcpu); break;
[PULL 11/18] arch: powerpc: kvm: book3s_pr.c: Remove unused function
From: Rickard Strandqvist Remove the function get_fpr_index() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_pr.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index cf2eb16..f573839 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -644,11 +644,6 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, return r; } -static inline int get_fpr_index(int i) -{ - return i * TS_FPRWIDTH; -} - /* Give up external provider (FPU, Altivec, VSX) */ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr) { -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 09/18] arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
From: Rickard Strandqvist Remove the function sr_nx() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_32_mmu.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index cd0b073..a2eb6d3 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -78,11 +78,6 @@ static inline bool sr_kp(u32 sr_raw) return (sr_raw & 0x2000) ? true: false; } -static inline bool sr_nx(u32 sr_raw) -{ - return (sr_raw & 0x1000) ? true: false; -} - static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data, bool iswrite); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 15/18] KVM: PPC: Book3S HV: Remove code for PPC970 processors
From: Paul Mackerras This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h| 2 - arch/powerpc/include/asm/kvm_book3s_64.h | 1 - arch/powerpc/include/asm/kvm_host.h | 14 -- arch/powerpc/include/asm/kvm_ppc.h | 2 - arch/powerpc/kernel/asm-offsets.c| 1 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 200 ++--- arch/powerpc/kvm/book3s_hv.c | 292 +++ arch/powerpc/kvm/book3s_hv_builtin.c | 104 +-- arch/powerpc/kvm/book3s_hv_interrupts.S | 39 + arch/powerpc/kvm/book3s_hv_ras.c | 5 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 110 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 245 +- arch/powerpc/kvm/powerpc.c | 10 +- 13 files changed, 70 insertions(+), 955 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 6acf0c2..942c7b1 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -170,8 +170,6 @@ extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr, unsigned long gpa, bool dirty); -extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, - long pte_index, unsigned long pteh, unsigned long ptel); extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, long pte_index, unsigned long pteh, unsigned long ptel, pgd_t *pgdir, bool realmode, unsigned long *idx_ret); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index a37f1a4..2d81e20 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -37,7 +37,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu) #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ -extern unsigned long kvm_rma_pages; #endif #define VRMA_VSID 0x1ffUL /* 1TB VSID reserved for VRMA */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 7cf94a5..5686a42 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,11 +180,6 @@ struct kvmppc_spapr_tce_table { struct page *pages[0]; }; -struct kvm_rma_info { - atomic_t use_count; - unsigned long base_pfn; -}; - /* XICS components, defined in book3s_xics.c */ struct kvmppc_xics; struct kvmppc_icp; @@ -214,16 +209,9 @@ struct revmap_entry { #define KVMPPC_RMAP_PRESENT0x1ul #define KVMPPC_RMAP_INDEX 0xul -/* Low-order bits in memslot->arch.slot_phys[] */ -#define KVMPPC_PAGE_ORDER_MASK 0x1f -#define KVMPPC_PAGE_NO_CACHE HPTE_R_I/* 0x20 */ -#define KVMPPC_PAGE_WRITETHRU HPTE_R_W/* 0x40 */ -#define KVMPPC_GOT_PAGE0x80 - struct kvm_arch_memory_slot { #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE unsigned long *rmap; - unsigned long *slot_phys; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ }; @@ -242,14 +230,12 @@ struct kvm_arch { struct kvm_rma_info *rma; unsigned long vrma_slb_v; int rma_setup_done; - int using_mmu_notifiers; u32 hpt_order; atomic_t vcpus_running; u32 online_vcores; unsigned long hpt_npte; unsigned long hpt_mask; atomic_t hpte_mod_interest; - spinlock_t slot_phys_lock; cpumask_t need_tlb_flush; int hpt_cma_alloc; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a6dcdb6..46bf652 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -170,8 +170,6 @@ extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce); extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba); -extern struct kvm_rma_info *kvm_alloc_rma(void); -
[PULL 00/18] ppc patch queue 2014-12-18
Hi Paolo, This is my current patch queue for ppc. Please pull. After the merge with Linus' tree, e500v2 compilation will be broken because commit 69111bac42f5 broke it upstream. Could you please take care to apply the fix I CC'ed you on for it? Thanks! Alex The following changes since commit e08e833616f7eefebdacfd1d743d80ff3c3b2585: KVM: cpuid: recompute CPUID 0xD.0:EBX,ECX (2014-12-05 13:57:49 +0100) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-next for you to fetch changes up to 476ce5ef09b21a76e74d07ff9d723ba0de49b53b: KVM: PPC: Book3S: Enable in-kernel XICS emulation by default (2014-12-17 22:23:22 +0100) Patch queue for ppc - 2014-12-18 Highights this time around: - Removal of HV support for 970. It became a maintenance burden and received practically no testing. POWER8 with HV is available now, so just grab one of those boxes if PR isn't enough for you. - Some bug fixes and performance improvements - Tracepoints for book3s_hv ---- Alexander Graf (1): KVM: PPC: BookE: Improve irq inject tracepoint Aneesh Kumar K.V (1): KVM: PPC: Book3S HV: Add missing HPTE unlock Anton Blanchard (1): KVM: PPC: Book3S: Enable in-kernel XICS emulation by default Cédric Le Goater (1): KVM: PPC: Book3S HV: ptes are big endian Mahesh Salgaonkar (1): KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI Paul Mackerras (5): KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Simplify locking around stolen time calculations KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register Rickard Strandqvist (4): arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function Sam Bobroff (1): KVM: PPC: Book3S HV: Improve H_CONFER implementation Suresh E. Warrier (3): KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions arch/powerpc/include/asm/kvm_book3s.h| 2 - arch/powerpc/include/asm/kvm_book3s_64.h | 3 +- arch/powerpc/include/asm/kvm_host.h | 18 +- arch/powerpc/include/asm/kvm_ppc.h | 2 - arch/powerpc/kernel/asm-offsets.c| 2 +- arch/powerpc/kvm/Kconfig | 1 + arch/powerpc/kvm/book3s.c| 8 - arch/powerpc/kvm/book3s_32_mmu.c | 5 - arch/powerpc/kvm/book3s_64_mmu_hv.c | 224 +++ arch/powerpc/kvm/book3s_hv.c | 438 ++-- arch/powerpc/kvm/book3s_hv_builtin.c | 136 +++-- arch/powerpc/kvm/book3s_hv_interrupts.S | 39 +-- arch/powerpc/kvm/book3s_hv_ras.c | 5 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 150 +++--- arch/powerpc/kvm/book3s_hv_rm_xics.c | 36 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 251 +--- arch/powerpc/kvm/book3s_paired_singles.c | 8 - arch/powerpc/kvm/book3s_pr.c | 5 - arch/powerpc/kvm/book3s_xics.c | 30 +- arch/powerpc/kvm/book3s_xics.h | 1 + arch/powerpc/kvm/e500.c | 8 - arch/powerpc/kvm/powerpc.c | 10 +- arch/powerpc/kvm/trace_book3s.h | 32 +++ arch/powerpc/kvm/trace_booke.h | 47 ++- arch/powerpc/kvm/trace_hv.h | 477 +++ arch/powerpc/kvm/trace_pr.h | 25 +- 26 files changed, 870 insertions(+), 1093 deletions(-) create mode 100644 arch/powerpc/kvm/trace_book3s.h create mode 100644 arch/powerpc/kvm/trace_hv.h -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 13/18] KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
From: Paul Mackerras Currently the calculations of stolen time for PPC Book3S HV guests uses fields in both the vcpu struct and the kvmppc_vcore struct. The fields in the kvmppc_vcore struct are protected by the vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for running the virtual core. This works correctly but confuses lockdep, because it sees that the code takes the tbacct_lock for a vcpu in kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in vcore_stolen_time(), and it thinks there is a possibility of deadlock, causing it to print reports like this: = [ INFO: possible recursive locking detected ] 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted - qemu-system-ppc/6188 is trying to acquire lock: (&(&vcpu->arch.tbacct_lock)->rlock){..}, at: [] .vcore_stolen_time+0x48/0xd0 [kvm_hv] but task is already holding lock: (&(&vcpu->arch.tbacct_lock)->rlock){..}, at: [] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(&(&vcpu->arch.tbacct_lock)->rlock); lock(&(&vcpu->arch.tbacct_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by qemu-system-ppc/6188: #0: (&vcpu->mutex){+.+.+.}, at: [] .vcpu_load+0x28/0xe0 [kvm] #1: (&(&vcore->lock)->rlock){+.+...}, at: [] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv] #2: (&(&vcpu->arch.tbacct_lock)->rlock){..}, at: [] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] stack backtrace: CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Call Trace: [c00b2754f3f0] [c0b31b6c] .dump_stack+0x88/0xb4 (unreliable) [c00b2754f470] [c00faeb8] .__lock_acquire+0x1878/0x2190 [c00b2754f600] [c00fbf0c] .lock_acquire+0xcc/0x1a0 [c00b2754f6d0] [c0b2954c] ._raw_spin_lock_irq+0x4c/0x70 [c00b2754f760] [decb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv] [c00b2754f7f0] [decb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv] [c00b2754f880] [decb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv] [c00b2754f9f0] [deb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [c00b2754fa60] [deb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm] [c00b2754faf0] [deb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm] [c00b2754fcb0] [c0267eb4] .do_vfs_ioctl+0x444/0x770 [c00b2754fd90] [c02682a4] .SyS_ioctl+0xc4/0xe0 [c00b2754fe30] [c00092e4] syscall_exit+0x0/0x98 In order to make the locking easier to analyse, we change the code to use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and preempt_tb fields. This lock needs to be an irq-safe lock since it is used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv() functions, which are called with the scheduler rq lock held, which is an irq-safe lock. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s_hv.c| 60 +++-- 2 files changed, 32 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0478556..7cf94a5 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -297,6 +297,7 @@ struct kvmppc_vcore { struct list_head runnable_threads; spinlock_t lock; wait_queue_head_t wq; + spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ u64 stolen_tb; u64 preempt_tb; struct kvm_vcpu *runner; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 1a7a281..74afa2d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -135,11 +135,10 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) * stolen. * * Updates to busy_stolen are protected by arch.tbacct_lock; - * updates to vc->stolen_tb are protected by the arch.tbacct_lock - * of the vcpu that has taken responsibility for running the vcore - * (i.e. vc->runner). The stolen times are measured in units of - * timebase ticks. (Note that the != TB_NIL checks below are - * purely defensive; they should never fail.) + * updates to vc->stolen_tb are protected by the vcore->stoltb_lock + * lock. The stolen times are measured in units of timebase ticks. + * (Note that the != TB_NIL checks below are purely defensive; + * they should never fail.) */ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu) @@ -147,12 +146,21 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu) struct kvmppc_vcore *vc = vcpu->arch.vcore; unsi
[PULL 12/18] arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
From: Rickard Strandqvist Remove the function inst_set_field() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_paired_singles.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c index bfb8035..bd6ab16 100644 --- a/arch/powerpc/kvm/book3s_paired_singles.c +++ b/arch/powerpc/kvm/book3s_paired_singles.c @@ -352,14 +352,6 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb) return kvmppc_get_field(inst, msb + 32, lsb + 32); } -/* - * Replaces inst bits with ordering according to spec. - */ -static inline u32 inst_set_field(u32 inst, int msb, int lsb, int value) -{ - return kvmppc_set_field(inst, msb + 32, lsb + 32, value); -} - bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst) { if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE)) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 02/18] KVM: PPC: Book3S HV: Add missing HPTE unlock
From: "Aneesh Kumar K.V" In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt Signed-off-by: Aneesh Kumar K.V Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index d407702..41f96c5 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1117,8 +1117,11 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } /* Now check and modify the HPTE */ - if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) + if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) { + /* unlock and continue */ + hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); continue; + } /* need to make it temporarily absent so C is stable */ hptep[0] |= cpu_to_be64(HPTE_V_ABSENT); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 04/18] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
From: Mahesh Salgaonkar When we get an HMI (hypervisor maintenance interrupt) while in a guest, we see that guest enters into paused state. The reason is, in kvmppc_handle_exit_hv it falls through default path and returns to host instead of resuming guest. This causes guest to enter into paused state. HMI is a hypervisor only interrupt and it is safe to resume the guest since the host has handled it already. This patch adds a switch case to resume the guest. Without this patch we see guest entering into paused state with following console messages: [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329356] Error detail: Timer facility experienced an error [ 3003.329359] HMER: 0840 [ 3003.329360] TFMR: 4a12000980a84000 [ 3003.329366] vcpu c007c35094c0 (40): [ 3003.329368] pc = c00c2ba0 msr = 80009032 trap = e60 [ 3003.329370] r 0 = c021ddc0 r16 = 0046 [ 3003.329372] r 1 = c0007a02bbd0 r17 = 327d5d98 [ 3003.329375] r 2 = c10980b8 r18 = 1fc9a0b0 [ 3003.329377] r 3 = c142d6b8 r19 = c142d6b8 [ 3003.329379] r 4 = 0002 r20 = [ 3003.329381] r 5 = c524a110 r21 = [ 3003.329383] r 6 = 0001 r22 = [ 3003.329386] r 7 = r23 = c524a110 [ 3003.329388] r 8 = r24 = 0001 [ 3003.329391] r 9 = 0001 r25 = c0007c31da38 [ 3003.329393] r10 = c14280b8 r26 = 0002 [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c524a110 [ 3003.329397] r12 = 28004484 r28 = c0007c31da38 [ 3003.329399] r13 = cfe01400 r29 = 0002 [ 3003.329401] r14 = 0046 r30 = c3011e00 [ 3003.329403] r15 = ffba r31 = 0002 [ 3003.329404] ctr = c041a670 lr = c0272520 [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002 [ 3003.329406] sprg0 = sprg1 = cfe01400 [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005 [ 3003.329408] cr = 48004482 xer = 2000 dsisr = 4200 [ 3003.329409] dar = 010015020048 [ 3003.329410] fault dar = 010015020048 dsisr = 4200 [ 3003.329411] SLB (8 entries): [ 3003.329412] ESID = c800 VSID = 40016e7779000510 [ 3003.329413] ESID = d801 VSID = 400142add1000510 [ 3003.329414] ESID = f804 VSID = 4000eb1a81000510 [ 3003.329415] ESID = 1f00080b VSID = 40004fda0a000d90 [ 3003.329416] ESID = 3f00080c VSID = 400039f536000d90 [ 3003.329417] ESID = 180d VSID = 0001251b35150d90 [ 3003.329417] ESID = 0100080e VSID = 4001e4609d90 [ 3003.329418] ESID = d8000819 VSID = 40013d349c000400 [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329526] Error detail: Timer facility experienced an error [ 3003.329527] HMER: 0840 [ 3003.329527] TFMR: 4a12000980a94000 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] [ 3006.359792] Error detail: Timer facility experienced an error [ 3006.359795] HMER: 0840 [ 3006.359797] TFMR: 4a12000980a84000 IdName State 2 guest2 running 3 guest3 paused 4 guest4 running Signed-off-by: Mahesh Salgaonkar Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index e63587d..cd7e030 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -769,6 +769,8 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->stat.ext_intr_exits++; r = RESUME_GUEST; break; + /* HMI is hypervisor interrupt and host has handled it. Resume guest.*/ + case BOOK3S_INTERRUPT_HMI: case BOOK3S_INTERRUPT_PERFMON: r = RESUME_GUEST; break; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 08/18] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
From: "Suresh E. Warrier" The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cd7e030..1a7a281 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu *vcpu, int wait_state) */ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) { + struct kvm_vcpu *vcpu; + int do_sleep = 1; + DEFINE_WAIT(wait); prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); + + /* +* Check one last time for pending exceptions and ceded state after +* we put ourselves on the wait queue +*/ + list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { + if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) { + do_sleep = 0; + break; + } + } + + if (!do_sleep) { + finish_wait(&vc->wq, &wait); + return; + } + vc->vcore_state = VCORE_SLEEPING; spin_unlock(&vc->lock); schedule(); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html