Re: [GIT PULL 3/3] KVM: s390: use simple switch statement as multiplexer
> Am 29.10.2015 um 16:08 schrieb Christian Borntraeger : > > We currently do some magic shifting (by exploiting that exit codes > are always a multiple of 4) and a table lookup to jump into the > exit handlers. This causes some calculations and checks, just to > do an potentially expensive function call. > > Changing that to a switch statement gives the compiler the chance > to inline and dynamically decide between jump tables or inline > compare and branches. In addition it makes the code more readable. > > bloat-o-meter gives me a small reduction in code size: > > add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348) > function old new delta > kvm_handle_sie_intercept 721058+986 > handle_prog 704 696 -8 > handle_noop 54 - -54 > handle_partial_execution 60 - -60 > intercept_funcs 120 --120 > handle_instruction 198 --198 > handle_validity 210 --210 > handle_stop 316 --316 > handle_external_interrupt368 --368 > > Right now my gcc does conditional branches instead of jump tables. > The inlining seems to give us enough cycles as some micro-benchmarking > shows minimal improvements, but still in noise. Awesome. I ended up with the same conclusions on switch vs table lookups in the ppc code back in the day. > > Signed-off-by: Christian Borntraeger > Reviewed-by: Cornelia Huck > --- > arch/s390/kvm/intercept.c | 42 +- > 1 file changed, 21 insertions(+), 21 deletions(-) > > diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c > index 7365e8a..b4a5aa1 100644 > --- a/arch/s390/kvm/intercept.c > +++ b/arch/s390/kvm/intercept.c > @@ -336,28 +336,28 @@ static int handle_partial_execution(struct kvm_vcpu > *vcpu) >return -EOPNOTSUPP; > } > > -static const intercept_handler_t intercept_funcs[] = { > -[0x00 >> 2] = handle_noop, > -[0x04 >> 2] = handle_instruction, > -[0x08 >> 2] = handle_prog, > -[0x10 >> 2] = handle_noop, > -[0x14 >> 2] = handle_external_interrupt, > -[0x18 >> 2] = handle_noop, > -[0x1C >> 2] = kvm_s390_handle_wait, > -[0x20 >> 2] = handle_validity, > -[0x28 >> 2] = handle_stop, > -[0x38 >> 2] = handle_partial_execution, > -}; > - > int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu) > { > -intercept_handler_t func; > -u8 code = vcpu->arch.sie_block->icptcode; > - > -if (code & 3 || (code >> 2) >= ARRAY_SIZE(intercept_funcs)) > +switch (vcpu->arch.sie_block->icptcode) { > +case 0x00: > +case 0x10: > +case 0x18: ... if you could convert these magic numbers to something more telling however, I think readability would improve even more! That can easily be a follow up patch though. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
> Am 12.09.2015 um 18:47 schrieb Nathan Whitehorn : > >> On 09/06/15 16:52, Paul Mackerras wrote: >>> On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote: >>> Anything I can do to help move these along? It's a big performance >>> improvement for FreeBSD guests. >> These patches are in Paolo's kvm-ppc-next branch and should go into >> Linus' tree in the next couple of days. >> >> Paul. > > One additional question. What is your preferred way to enable these? Since > these are part of the mandatory part of the PAPR spec, I think there's an > argument to add them to the default_hcall_list? Otherwise, they should be > enabled by default in QEMU (I can take care of sending that patch if you > prefer this route). The default hcall list just describes which hcalls were implicitly enabled at the point in time we made them enableable by user space. IMHO no new hcalls should get added there. So yes, please send a patch to qemu :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] KVM memory slots limit on powerpc
On 04.09.15 11:59, Christian Borntraeger wrote: > Am 04.09.2015 um 11:35 schrieb Thomas Huth: >> >> Hi all, >> >> now that we get memory hotplugging for the spapr machine on qemu-ppc, >> too, it seems like we easily can hit the amount of KVM-internal memory >> slots now ("#define KVM_USER_MEM_SLOTS 32" in >> arch/powerpc/include/asm/kvm_host.h). For example, start >> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m >> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and >> you'll see that it aborts way earlier already. >> >> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509 >> already (+3 internal slots = 512) ... maybe we should now increase the >> amount of slots on powerpc, too? Since we don't use internal slots on >> POWER, would 512 be a good value? Or would less be sufficient, too? > > When you are at it, the s390 value should also be increased I guess. That constant defines the array size for the memslot array in struct kvm which in turn again gets allocated by kzalloc, so it's pinned kernel memory that is physically contiguous. Doing big allocations can turn into problems during runtime. So maybe there is another way? Can we extend the memslot array size dynamically somehow? Allocate it separately? How much memory does the memslot array use up with 512 entries? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ppc: Fix size of the PSPB register
> Am 02.09.2015 um 09:26 schrieb Thomas Huth : > >> On 02/09/15 00:55, Benjamin Herrenschmidt wrote: >>> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote: >>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt >>> wrote: On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > The size of the Problem State Priority Boost Register is only > 32 bits, so let's change the type of the corresponding variable > accordingly to avoid future trouble. It's not future trouble, it's broken today for LE and this should fix it BUT >>> >>> No, it's broken today for BE hosts, which will always see 0 for the >>> PSPB register value. LE hosts are fine. > > Right ... I just meant that nobody really experienced trouble with this > today yet, but the bug is already present now already of course. Sounds like a great candidate for kvm-unit-tests then, no? ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio: Enable VFIO device for powerpc
On 13.08.15 03:15, David Gibson wrote: > ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is > used to handle any necessary interactions between KVM and VFIO. > > Currently that device is built on x86 and ARM, but not powerpc, although > powerpc does support both KVM and VFIO. This makes things awkward in > userspace > > Currently qemu prints an alarming error message if you attempt to use VFIO > and it can't initialize the KVM VFIO device. We don't want to remove the > warning, because lack of the KVM VFIO device could mean coherency problems > on x86. On powerpc, however, the error is harmless but looks disturbing, > and a test based on host architecture in qemu would be ugly, and break if > we do need the KVM VFIO device for something important in future. > > There's nothing preventing the KVM VFIO device from being built for > powerpc, so this patch turns it on. It won't actually do anything, since > we don't define any of the arch_*() hooks, but it will make qemu happy and > we can extend it in future if we need to. > > Signed-off-by: David Gibson > Reviewed-by: Eric Auger Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm patches should get CC on the kvm-ppc@vger mailing list ;). Paul, could you please pick this one up? Thanks! Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 00/12] ppc patch queue 2015-08-22
On 22.08.15 15:32, Paolo Bonzini wrote: > > > On 22/08/2015 02:21, Alexander Graf wrote: >> Hi Paolo, >> >> This is my current patch queue for ppc. Please pull. > > Done, but this queue has not been in linux-next. Please push to > kvm-ppc-next on your github Linux tree as well; please keep an eye on Ah, sorry. I pushed to kvm-ppc-next in parallel to sending the request. > Steven Rothwell's messages in the next few days, and I'll send the pull > request sometimes next week via webmail if everything goes fine. Nothing exciting came in so far, so I hope we're good :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 08/12] KVM: PPC: Book3S HV: Fix bug in dirty page tracking
From: Paul Mackerras This fixes a bug in the tracking of pages that get modified by the guest. If the guest creates a large-page HPTE, writes to memory somewhere within the large page, and then removes the HPTE, we only record the modified state for the first normal page within the large page, when in fact the guest might have modified some other normal page within the large page. To fix this we use some unused bits in the rmap entry to record the order (log base 2) of the size of the page that was modified, when removing an HPTE. Then in kvm_test_clear_dirty_npages() we use that order to return the correct number of modified pages. The same thing could in principle happen when removing a HPTE at the host's request, i.e. when paging out a page, except that we never page out large pages, and the guest can only create large-page HPTEs if the guest RAM is backed by large pages. However, we also fix this case for the sake of future-proofing. The reference bit is also subject to the same loss of information. We don't make the same fix here for the reference bit because there isn't an interface for userspace to find out which pages the guest has referenced, whereas there is one for userspace to find out which pages the guest has modified. Because of this loss of information, the kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly say that a page has not been referenced when it has, but that doesn't matter greatly because we never page or swap out large pages. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 17 + 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..e6b2534 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -158,6 +158,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); +extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize); extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep, diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 80eb29a..e187b6a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -205,8 +205,10 @@ struct revmap_entry { */ #define KVMPPC_RMAP_LOCK_BIT 63 #define KVMPPC_RMAP_RC_SHIFT 32 +#define KVMPPC_RMAP_CHG_SHIFT 48 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R << KVMPPC_RMAP_RC_SHIFT) #define KVMPPC_RMAP_CHANGED(HPTE_R_C << KVMPPC_RMAP_RC_SHIFT) +#define KVMPPC_RMAP_CHG_ORDER (0x3ful << KVMPPC_RMAP_CHG_SHIFT) #define KVMPPC_RMAP_PRESENT0x1ul #define KVMPPC_RMAP_INDEX 0xul diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index dab68b7..1f9c0a1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -761,6 +761,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, /* Harvest R and C */ rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C); *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT; + if (rcbits & HPTE_R_C) + kvmppc_update_rmap_change(rmapp, psize); if (rcbits & ~rev[i].guest_rpte) { rev[i].guest_rpte = ptel | rcbits; note_hpte_modification(kvm, &rev[i]); @@ -927,8 +929,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) retry: lock_rmap(rmapp); if (*rmapp & KVMPPC_RMAP_CHANGED) { - *rmapp &= ~KVMPPC_RMAP_CHANGED; + long change_order = (*rmapp & KVMPPC_RMAP_CHG_ORDER) + >> KVMPPC_RMAP_CHG_SHIFT; + *rmapp &= ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER); npages_dirty = 1; + if (change_order > PAGE_SHIFT) + npages_dirty = 1ul << (change_order - PAGE_SHIFT); } if (!(*rmapp & KVMPPC_RMAP_PRESENT)) { unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c6d601c..c7a3ab2 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/k
[PULL 03/12] KVM: PPC: Fix warnings from sparse
From: Thomas Huth When compiling the KVM code for POWER with "make C=1", sparse complains about functions missing proper prototypes and a 64-bit constant missing the ULL prefix. Let's fix this by making the functions static or by including the proper header with the prototypes, and by appending a ULL prefix to the constant PPC_MPPE_ADDRESS_MASK. Signed-off-by: Thomas Huth Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/ppc-opcode.h| 2 +- arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_32_mmu_host.c| 1 + arch/powerpc/kvm/book3s_64_mmu_host.c| 1 + arch/powerpc/kvm/book3s_emulate.c| 1 + arch/powerpc/kvm/book3s_hv.c | 8 arch/powerpc/kvm/book3s_paired_singles.c | 2 +- arch/powerpc/kvm/powerpc.c | 2 +- 8 files changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 8452335..790f5d1 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -287,7 +287,7 @@ /* POWER8 Micro Partition Prefetch (MPP) parameters */ /* Address mask is common for LOGMPP instruction and MPPR SPR */ -#define PPC_MPPE_ADDRESS_MASK 0xc000 +#define PPC_MPPE_ADDRESS_MASK 0xc000ULL /* Bits 60 and 61 of MPP SPR should be set to one of the following */ /* Aborting the fetch is indeed setting 00 in the table size bits */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 05ea8fc..53285d5 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -240,7 +240,8 @@ void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags) kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); } -int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) +static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, +unsigned int priority) { int deliver = 1; int vec = 0; diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 2035d16..d5c9bfe 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -26,6 +26,7 @@ #include #include #include +#include "book3s.h" /* #define DEBUG_MMU */ /* #define DEBUG_SR */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index b982d92..79ad35a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -28,6 +28,7 @@ #include #include #include "trace_pr.h" +#include "book3s.h" #define PTE_SIZE 12 diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 5a2bc4b..2afdb9c 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -23,6 +23,7 @@ #include #include #include +#include "book3s.h" #define OP_19_XOP_RFID 18 #define OP_19_XOP_RFI 50 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68d067a..6e588ac 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -214,12 +214,12 @@ static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) kvmppc_end_cede(vcpu); } -void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) +static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) { vcpu->arch.pvr = pvr; } -int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) +static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) { unsigned long pcr = 0; struct kvmppc_vcore *vc = vcpu->arch.vcore; @@ -259,7 +259,7 @@ int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) return 0; } -void kvmppc_dump_regs(struct kvm_vcpu *vcpu) +static void kvmppc_dump_regs(struct kvm_vcpu *vcpu) { int r; @@ -292,7 +292,7 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu) vcpu->arch.last_inst); } -struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id) +static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id) { int r; struct kvm_vcpu *v, *ret = NULL; diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c index bd6ab16..a759d9a 100644 --- a/arch/powerpc/kvm/book3s_paired_singles.c +++ b/arch/powerpc/kvm/book3s_paired_singles.c @@ -352,7 +352,7 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb) return kvmppc_get_field(inst, msb + 32, lsb + 32); } -bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst) +static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst) { if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE)) return false; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e5dd
[PULL 01/12] KVM: PPC: fix suspicious use of conditional operator
From: Tudor Laurentiu This was signaled by a static code analysis tool. Signed-off-by: Laurentiu Tudor Reviewed-by: Scott Wood Signed-off-by: Alexander Graf --- arch/powerpc/kvm/e500_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c index 50860e9..29911a0 100644 --- a/arch/powerpc/kvm/e500_mmu.c +++ b/arch/powerpc/kvm/e500_mmu.c @@ -377,7 +377,7 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea) | MAS0_NV(vcpu_e500->gtlb_nv[tlbsel]); vcpu->arch.shared->mas1 = (vcpu->arch.shared->mas6 & MAS6_SPID0) - | (vcpu->arch.shared->mas6 & (MAS6_SAS ? MAS1_TS : 0)) + | ((vcpu->arch.shared->mas6 & MAS6_SAS) ? MAS1_TS : 0) | (vcpu->arch.shared->mas4 & MAS4_TSIZED(~0)); vcpu->arch.shared->mas2 &= MAS2_EPN; vcpu->arch.shared->mas2 |= vcpu->arch.shared->mas4 & -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 02/12] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
From: Thomas Huth Since the PPC970 support has been removed from the kvm-hv kernel module recently, we should also reflect this change in the help text of the corresponding Kconfig option. Signed-off-by: Thomas Huth Signed-off-by: Alexander Graf --- arch/powerpc/kvm/Kconfig | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 3caec2c..c2024ac 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -74,14 +74,14 @@ config KVM_BOOK3S_64 If unsure, say N. config KVM_BOOK3S_64_HV - tristate "KVM support for POWER7 and PPC970 using hypervisor mode in host" + tristate "KVM for POWER7 and later using hypervisor mode in host" depends on KVM_BOOK3S_64 && PPC_POWERNV select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA ---help--- Support running unmodified book3s_64 guest kernels in - virtual machines on POWER7 and PPC970 processors that have + virtual machines on POWER7 and newer processors that have hypervisor mode available to the host. If you say Y here, KVM will use the hardware virtualization @@ -89,8 +89,8 @@ config KVM_BOOK3S_64_HV guest operating systems will run at full hardware speed using supervisor and user modes. However, this also means that KVM is not usable under PowerVM (pHyp), is only usable - on POWER7 (or later) processors and PPC970-family processors, - and cannot emulate a different processor from the host processor. + on POWER7 or later processors, and cannot emulate a + different processor from the host processor. If unsure, say N. -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 05/12] KVM: PPC: Book3S HV: Make use of unused threads when running guests
From: Paul Mackerras When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu->arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Reviewed-by: David Gibson Tested-by: Laurent Vivier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 19 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kvm/book3s_hv.c| 333 ++-- arch/powerpc/kvm/book3s_hv_builtin.c| 7 +- arch/powerpc/kvm/book3s_hv_rm_xics.c| 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 + 6 files changed, 298 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d91f65b..2b74490 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -278,7 +278,9 @@ struct kvmppc_vcore { u16 last_cpu; u8 vcore_state; u8 in_guest; + struct kvmppc_vcore *master_vcore; struct list_head runnable_threads; + struct list_head preempt_list; spinlock_t lock; wait_queue_head_t wq; spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ @@ -300,12 +302,18 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) -/* Values for vcore_state */ +/* + * Values for vcore_state. + * Note that these are arranged such that lower values + * (< VCORE_SLEEPING) don't require stolen time accounting + * on load/unload, and higher values do. + */ #define VCORE_INACTIVE 0 -#define VCORE_SLEEPING 1 -#define VCORE_PREEMPT 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_PREEMPT 1 +#define VCORE_PIGGYBACK2 +#define VCORE_SLEEPING 3 +#define VCORE_RUNNING 4 +#define VCORE_EXITING 5 /* * Struct used to manage memory for a virtual processor area @@ -619,6 +627,7 @@ struct kvm_vcpu_arch { int trap; int state; int ptid; + int thread_cpu; bool timer_running; wait_queue_head_t cpu_run; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9823057..a78cdbf 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -512,6 +512,8 @@ int main(void) DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst)); + DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu)); + DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6e588ac..0173ce2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); #define MPP_BUFFER_ORDER 3 #endif +static int target_smt_mode; +module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)"); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu) static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int cpu = vcpu->cpu; + int cpu; wait_queue_head_t *wqp; wqp = kvm_arch_vcpu_
[PULL 04/12] KVM: PPC: add missing pt_regs initialization
From: Tudor Laurentiu On this switch branch the regs initialization doesn't happen so add it. This was found with the help of a static code analysis tool. Signed-off-by: Laurentiu Tudor Signed-off-by: Alexander Graf --- arch/powerpc/kvm/booke.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index cc58426..ae458f0 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -933,6 +933,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, #endif break; case BOOKE_INTERRUPT_CRITICAL: + kvmppc_fill_pt_regs(®s); unknown_exception(®s); break; case BOOKE_INTERRUPT_DEBUG: -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 12/12] KVM: PPC: Book3S: correct width in XER handling
From: Sam bobroff In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff Reviewed-by: Laurent Vivier Reviewed-by: Thomas Huth Tested-by: Thomas Huth Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 4 ++-- arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +- arch/powerpc/include/asm/kvm_booke.h | 4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +++--- arch/powerpc/kvm/book3s_segment.S | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index e6b2534..9fac01c 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -226,12 +226,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu->arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu->arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu->arch.xer; } diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 57d5dfe..72b6225 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -132,7 +132,7 @@ struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; - u32 xer; + ulong xer; ulong ctr; ulong lr; ulong pc; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index 3286f0d..bc6e29e 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu->arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu->arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu->arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index e347766..472680f 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -944,7 +944,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1181,7 +1181,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1763,7 +1763,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 09/12] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
From: Paul Mackerras This adds implementations for the H_CLEAR_REF (test and clear reference bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls. When clearing the reference or change bit in the guest view of the HPTE, we also have to clear it in the real HPTE so that we can detect future references or changes. When we do so, we transfer the R or C bit value to the rmap entry for the underlying host page so that kvm_age_hva_hv(), kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page has been referenced and/or changed. These hypercalls are not used by Linux guests. These implementations have been tested using a FreeBSD guest. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +- 2 files changed, 121 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c7a3ab2..c1df9bb 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize) } EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change); +/* Returns a pointer to the revmap entry for the page mapped by a HPTE */ +static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v, + unsigned long hpte_gr) +{ + struct kvm_memory_slot *memslot; + unsigned long *rmap; + unsigned long gfn; + + gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr)); + memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); + if (!memslot) + return NULL; + + rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]); + return rmap; +} + /* Remove this HPTE from the chain for a real page */ static void remove_revmap_chain(struct kvm *kvm, long pte_index, struct revmap_entry *rev, unsigned long hpte_v, unsigned long hpte_r) { struct revmap_entry *next, *prev; - unsigned long gfn, ptel, head; - struct kvm_memory_slot *memslot; + unsigned long ptel, head; unsigned long *rmap; unsigned long rcbits; rcbits = hpte_r & (HPTE_R_R | HPTE_R_C); ptel = rev->guest_rpte |= rcbits; - gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel)); - memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); - if (!memslot) + rmap = revmap_for_hpte(kvm, hpte_v, ptel); + if (!rmap) return; - - rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]); lock_rmap(rmap); head = *rmap & KVMPPC_RMAP_INDEX; @@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, return H_SUCCESS; } +long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu->kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index >= kvm->arch.hpt_npte) + return H_PARAMETER; + + rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]); + hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4)); + while (!try_lock_hpte(hpte, HPTE_V_HVLOCK)) + cpu_relax(); + v = be64_to_cpu(hpte[0]); + r = be64_to_cpu(hpte[1]); + if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT))) + goto out; + + gr = rev->guest_rpte; + if (rev->guest_rpte & HPTE_R_R) { + rev->guest_rpte &= ~HPTE_R_R; + note_hpte_modification(kvm, rev); + } + if (v & HPTE_V_VALID) { + gr |= r & (HPTE_R_R | HPTE_R_C); + if (r & HPTE_R_R) { + kvmppc_clear_ref_hpte(kvm, hpte, pte_index); + rmap = revmap_for_hpte(kvm, v, gr); + if (rmap) { + lock_rmap(rmap); + *rmap |= KVMPPC_RMAP_REFERENCED; + unlock_rmap(rmap); + } + } + } + vcpu->arch.gpr[4] = gr; + ret = H_SUCCESS; + out: + unlock_hpte(hpte, v & ~HPTE_V_HVLOCK); + return ret; +} + +long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu->kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index >= kvm->arch.hpt_npte) + return H_
[PULL 10/12] KVM: PPC: Book3S HV: Fix preempted vcore list locking
From: Paul Mackerras When a vcore gets preempted, we put it on the preempted vcore list for the current CPU. The runner task then calls schedule() and comes back some time later and takes itself off the list. We need to be careful to lock the list that it was put onto, which may not be the list for the current CPU since the runner task may have moved to another CPU. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6e3ef30..3d02276 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore *vc) static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc) { - struct preempted_vcore_list *lp = this_cpu_ptr(&preempted_vcores); + struct preempted_vcore_list *lp; kvmppc_core_end_stolen(vc); if (!list_empty(&vc->preempt_list)) { + lp = &per_cpu(preempted_vcores, vc->pcpu); spin_lock(&lp->lock); list_del_init(&vc->preempt_list); spin_unlock(&lp->lock); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 11/12] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
From: Paul Mackerras Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen time for it. This currently isn't the case when we have a vcore that no longer has any runnable threads in it but still has a runner task, so we do an explicit call to kvmppc_core_start_stolen() in that case. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3d02276..fad52f2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master) } list_del_init(&vc->preempt_list); if (!is_master) { - vc->vcore_state = vc->runner ? VCORE_PREEMPT : VCORE_INACTIVE; - if (still_running > 0) + if (still_running > 0) { kvmppc_vcore_preempt(vc); + } else if (vc->runner) { + vc->vcore_state = VCORE_PREEMPT; + kvmppc_core_start_stolen(vc); + } else { + vc->vcore_state = VCORE_INACTIVE; + } if (vc->n_runnable > 0 && vc->runner == NULL) { /* make sure there's a candidate runner awake */ vcpu = list_first_entry(&vc->runnable_threads, -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 06/12] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
From: Paul Mackerras This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 367 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 473 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..57d5dfe 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_SMT_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_SMT_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-o
[PULL 00/12] ppc patch queue 2015-08-22
Hi Paolo, This is my current patch queue for ppc. Please pull. Alex The following changes since commit 4d283ec908e617fa28bcb06bce310206f0655d67: x86/kvm: Rename VMX's segment access rights defines (2015-08-15 00:47:13 +0200) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-next for you to fetch changes up to c63517c2e3810071359af926f621c1f784388c3f: KVM: PPC: Book3S: correct width in XER handling (2015-08-22 11:16:19 +0200) Patch queue for ppc - 2015-08-22 Highlights for KVM PPC this time around: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8 Paul Mackerras (7): KVM: PPC: Book3S HV: Make use of unused threads when running guests KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE KVM: PPC: Book3S HV: Fix bug in dirty page tracking KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD KVM: PPC: Book3S HV: Fix preempted vcore list locking KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation Sam bobroff (1): KVM: PPC: Book3S: correct width in XER handling Thomas Huth (2): KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig KVM: PPC: Fix warnings from sparse Tudor Laurentiu (2): KVM: PPC: fix suspicious use of conditional operator KVM: PPC: add missing pt_regs initialization arch/powerpc/include/asm/kvm_book3s.h | 5 +- arch/powerpc/include/asm/kvm_book3s_asm.h | 22 +- arch/powerpc/include/asm/kvm_booke.h | 4 +- arch/powerpc/include/asm/kvm_host.h | 24 +- arch/powerpc/include/asm/ppc-opcode.h | 2 +- arch/powerpc/kernel/asm-offsets.c | 9 + arch/powerpc/kvm/Kconfig | 8 +- arch/powerpc/kvm/book3s.c | 3 +- arch/powerpc/kvm/book3s_32_mmu_host.c | 1 + arch/powerpc/kvm/book3s_64_mmu_host.c | 1 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +- arch/powerpc/kvm/book3s_emulate.c | 1 + arch/powerpc/kvm/book3s_hv.c | 660 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 32 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 161 +++- arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 128 +- arch/powerpc/kvm/book3s_paired_singles.c | 2 +- arch/powerpc/kvm/book3s_segment.S | 4 +- arch/powerpc/kvm/booke.c | 1 + arch/powerpc/kvm/e500_mmu.c | 2 +- arch/powerpc/kvm/powerpc.c| 2 +- 22 files changed, 938 insertions(+), 146 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 07/12] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
From: Paul Mackerras The reference (R) and change (C) bits in a HPT entry can be set by hardware at any time up until the HPTE is invalidated and the TLB invalidation sequence has completed. This means that when removing a HPTE, we need to read the HPTE after the invalidation sequence has completed in order to obtain reliable values of R and C. The code in kvmppc_do_h_remove() used to do this. However, commit 6f22bd3265fb ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the read after invalidation as a side effect of other changes. This restores the read of the HPTE after invalidation. The user-visible effect of this bug would be that when migrating a guest, there is a small probability that a page modified by the guest and then unmapped by the guest might not get re-transmitted and thus the destination might end up with a stale copy of the page. Fixes: 6f22bd3265fb Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index b027a89..c6d601c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]); v = pte & ~HPTE_V_HVLOCK; if (v & HPTE_V_VALID) { - u64 pte1; - - pte1 = be64_to_cpu(hpte[1]); hpte[0] &= ~cpu_to_be64(HPTE_V_VALID); - rb = compute_tlbie_rb(v, pte1, pte_index); + rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index); do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true); - /* Read PTE low word after tlbie to get final R/C values */ - remove_revmap_chain(kvm, pte_index, rev, v, pte1); + /* +* The reference (R) and change (C) bits in a HPT +* entry can be set by hardware at any time up until +* the HPTE is invalidated and the TLB invalidation +* sequence has completed. This means that when +* removing a HPTE, we need to re-read the HPTE after +* the invalidation sequence has completed in order to +* obtain reliable values of R and C. +*/ + remove_revmap_chain(kvm, pte_index, rev, v, + be64_to_cpu(hpte[1])); } r = rev->guest_rpte & ~HPTE_GR_RESERVED; note_hpte_modification(kvm, rev); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix incorrect return statement in the function mpic_set_default_irq_routing
On 12.08.15 21:06, nick wrote: > > > On 2015-08-12 03:05 PM, Alexander Graf wrote: >> >> >> On 07.08.15 17:54, Nicholas Krause wrote: >>> This fixes the incorrect return statement in the function >>> mpic_set_default_irq_routing from always returning zero >>> to signal success to this function's caller to instead >>> return the return value of kvm_set_irq_routing as this >>> function can fail and we need to correctly signal the >>> caller of mpic_set_default_irq_routing when the call >>> to this particular function has failed. >>> >>> Signed-off-by: Nicholas Krause >> >> I like the patch, but I don't see it on the kvm-ppc mailing list. It >> doesn't show up on patchwork or spinics. Did something go wrong while >> sending it out? >> >> >> Alex >> > Alex, > Ask Paolo about it as he would be able to explain it better then I. Well, whatever the reason, I can only apply patches that actually appeared on the public mailing list. Otherwise people may not get the chance to review them ;). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c
On 10.08.15 17:27, Nicholas Krause wrote: > This fixes the wrapper functions kvm_umap_hva_hv and the function > kvm_unmap_hav_range_hv to return the return value of the function > kvm_handle_hva or kvm_handle_hva_range that they are wrapped to > call internally rather then always making the caller of these > wrapper functions think they always run successfully by returning > the value of zero directly. > > Signed-off-by: Nicholas Krause Paul, could you please take on this one? Thanks, Alex > --- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index dab68b7..0905c8f 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -774,14 +774,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned > long *rmapp, > > int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva) > { > - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); > - return 0; > + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); > } > > int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned > long end) > { > - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); > - return 0; > + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); > } > > void kvmppc_core_flush_memslot_hv(struct kvm *kvm, > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix incorrect return statement in the function mpic_set_default_irq_routing
On 07.08.15 17:54, Nicholas Krause wrote: > This fixes the incorrect return statement in the function > mpic_set_default_irq_routing from always returning zero > to signal success to this function's caller to instead > return the return value of kvm_set_irq_routing as this > function can fail and we need to correctly signal the > caller of mpic_set_default_irq_routing when the call > to this particular function has failed. > > Signed-off-by: Nicholas Krause I like the patch, but I don't see it on the kvm-ppc mailing list. It doesn't show up on patchwork or spinics. Did something go wrong while sending it out? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests PATCH 11/14] powerpc/ppc64: add rtas_power_off
On 03.08.15 19:02, Andrew Jones wrote: > On Mon, Aug 03, 2015 at 07:08:17PM +0200, Paolo Bonzini wrote: >> >> >> On 03/08/2015 16:41, Andrew Jones wrote: >>> Add enough RTAS support to support power-off, and apply it to >>> exit(). >>> >>> Signed-off-by: Andrew Jones >> >> Why not use virtio-mmio + testdev on ppc as well? Similar to how we're >> not using PSCI on ARM or ACPI on x86. > > I have some longer term plans to add minimal virtio-pci support to > kvm-unit-tests, and then we could plug virtio-serial+chr-testdev into > that. I didn't think I could use virtio-mmio directly with spapr, but > maybe I can? Actually, I sort of like this approach more in some You would need to add support for the dynamic sysbus device allocation in the spapr machine, but then I don't see why it wouldn't work. PCI however is the more natural choice on sPAPR if you want to do virtio. That said, if all you need is a chr transport, IIRC there should be a way to get you additional channels on the existing "serial port" - which really is just a simply hypercall interface. But David is the best person to guide you to the best path forward here. Alex > respects though, as it doesn't require a special testdev or virtio > support, keeping the unit test extra minimal. In fact, I was even > thinking about posting patches (which I've already written) that > allow chr-testdev to be optional for ARM too, now that it could use > the exitcode snooper. > > Thanks, > drew > >> >> Paolo >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Two fixes for dynamic micro-threading
On 20.07.15 08:49, David Gibson wrote: > On Thu, Jul 16, 2015 at 05:11:12PM +1000, Paul Mackerras wrote: >> This series contains two fixes for the new dynamic micro-threading >> code that was added recently for HV-mode KVM on Power servers. >> The patches are against Alex Graf's kvm-ppc-queue branch. Please >> apply. > > agraf, > > Any word on these? These appear to fix a really nasty host crash in > current upstream. I'd really like to see them merged ASAP. Thanks, applied to kvm-ppc-queue. The host crash should only occur with dynamic micro-threading enabled, which is not in Linus' tree, correct? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] PPC: Current patch queue for HV KVM
On 24.06.15 13:18, Paul Mackerras wrote: > This is my current queue of patches for HV KVM. This series is based > on the kvm next branch. They have all been posted 6 weeks ago or > more, though I have just added a 3-line fix to patch 2/5 to fix a bug > that we found in testing migration, and I expanded a comment (no code > change) in patch 3/5 following a suggestion by Aneesh. > > I'd like to see these go into 4.2 if possible. Thanks, applied all to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
On 06/24/15 13:18, Paul Mackerras wrote: This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc)((vc)->entry_exit_map >> 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/
Re: [PATCH 1/3] powerpc: implement barrier primitives
On 17.06.15 12:15, Will Deacon wrote: > On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: >> Instead of referring to the Linux header including the barrier >> macros, copy over the rather simple implementation for the PowerPC >> barrier instructions kvmtool uses. This fixes build for powerpc. >> >> Signed-off-by: Andre Przywara >> --- >> Hi, >> >> I just took what kvmtool seems to have used before, I actually have >> no idea if "sync" is the right instruction or "lwsync" would do. >> Would be nice if some people with PowerPC knowledge could comment. > > I *think* we can use lwsync for rmb and wmb, but would want confirmation > from a ppc guy before making that change! Also I'd prefer to play safe for now :) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: add missing pt_regs initialization
On 18.05.15 14:44, Laurentiu Tudor wrote: > On this switch branch the regs initialization > doesn't happen so add it. > This was found with the help of a static > code analysis tool. > > Signed-off-by: Laurentiu Tudor > Cc: Scott Wood > Cc: Mihai Caraman Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
On 21.05.15 21:37, Scott Wood wrote: > On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote: >> If passed a larger page size lookup_linux_ptep() >> may fail, so add a check for that and bail out >> if that's the case. >> This was found with the help of a static >> code analysis tool. >> >> Signed-off-by: Mihai Caraman >> Signed-off-by: Laurentiu Tudor >> Cc: Scott Wood >> --- >> based on https://github.com/agraf/linux-2.6.git kvm-ppc-next >> >> arch/powerpc/kvm/e500_mmu_host.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > Reviewed-by: Scott Wood Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
On 22.05.15 11:41, Thomas Huth wrote: > Since the PPC970 support has been removed from the kvm-hv kernel > module recently, we should also reflect this change in the help > text of the corresponding Kconfig option. > > Signed-off-by: Thomas Huth Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Fix warnings from sparse
On 22.05.15 09:25, Thomas Huth wrote: > When compiling the KVM code for POWER with "make C=1", sparse > complains about functions missing proper prototypes and a 64-bit > constant missing the ULL prefix. Let's fix this by making the > functions static or by including the proper header with the > prototypes, and by appending a ULL prefix to the constant > PPC_MPPE_ADDRESS_MASK. > > Signed-off-by: Thomas Huth Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
On 22.05.15 11:41, Thomas Huth wrote: > Since the PPC970 support has been removed from the kvm-hv kernel > module recently, we should also reflect this change in the help > text of the corresponding Kconfig option. > > Signed-off-by: Thomas Huth Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: fix suspicious use of conditional operator
On 25.05.15 10:48, Laurentiu Tudor wrote: > This was signaled by a static code analysis tool. > > Signed-off-by: Laurentiu Tudor > Reviewed-by: Scott Wood Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case
On 29.04.15 06:49, Paul Mackerras wrote: > This fixes a regression introduced in commit 25fedfca94cf, "KVM: PPC: > Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which > leads to a user-triggerable oops. > > In the case where we try to run a vcore on a physical core that is > not in single-threaded mode, or the vcore has too many threads for > the physical core, we iterate the list of runnable vcpus to make > each one return an EBUSY error to userspace. Since this involves > taking each vcpu off the runnable_threads list for the vcore, we > need to use list_for_each_entry_safe rather than list_for_each_entry > to traverse the list. Otherwise the kernel will crash with an oops > message like this: > > Unable to handle kernel paging request for data at address 0x000fff88 > Faulting instruction address: 0xd0001e635dc8 > Oops: Kernel access of bad area, sig: 11 [#2] > SMP NR_CPUS=1024 NUMA PowerNV > ... > CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G D3.18.0 #1 > task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000 > NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50 > REGS: c027d19275b0 TRAP: 0300 Tainted: G D (3.18.0) > MSR: 90009033 CR: 22002824 XER: > CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1 > GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001 > GPR04: 0001 0001 > GPR08: 00200200 d0001e63e588 > GPR12: 2200 c7dbc800 c00fc780 000a > GPR16: fffc c00fd5439690 c00fc7801c98 0001 > GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350 > GPR24: c00fa57f 0030 > GPR28: fff0 c00fd543b328 000fe468 c00fd543b300 > NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv] > LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] > Call Trace: > [c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] > (unreliable) > [c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv] > [c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm] > [c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] > [c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm] > [c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780 > [c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0 > [c027d1927e30] [c0009358] syscall_exit+0x0/0x98 > Instruction dump: > 6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018 > ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc > ---[ end trace 8cdf50251cca6680 ]--- > > Fixes: 25fedfca94cf > Signed-off-by: Paul Mackerras Reviewed-by: Alexander Graf Paolo, can you please take this patch into 4.1 directly? Thanks a lot, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM
On 04/27/2015 03:57 PM, Martin Schwidefsky wrote: On Mon, 27 Apr 2015 15:48:42 +0200 Alexander Graf wrote: On 04/23/2015 02:13 PM, Martin Schwidefsky wrote: On Thu, 23 Apr 2015 14:01:23 +0200 Alexander Graf wrote: As far as alternative approaches go, I don't have a great idea otoh. We could have an elf flag indicating that this process needs 4k page tables to limit the impact to a single process. In fact, could we maybe still limit the scope to non-global? A personality may work as well. Or ulimit? I tried the ELF flag approach, does not work. The trouble is that allocate_mm() has to create the page tables with 4K tables if you want to change the page table layout later on. We have learned the hard way that the direction 2K to 4K does not work due to races in the mm. Now there are two major cases: 1) fork + execve and 2) fork only. The ELF flag can be used to reduce from 4K to 2K for 1) but not 2). 2) is required for apps that use lots of forking, e.g. database or web servers. Same goes for the approach with a personality flag or ulimit. We would have to distinguish the two cases for allocate_mm(), if the new mm is allocated for a fork the current mm decides 2K vs. 4K. If the new mm is allocated by binfmt_elf, then start with 4K and do the downgrade after the ELF flag has been evaluated. Well, you could also make it a personality flag for example, no? Then every new process below a certain one always gets 4k page tables until they drop the personality, at which point each child would only get 2k page tables again. I'm mostly concerned that people will end up mixing VMs and other workloads on the same LPAR, so I don't think there's a one-shoe-fits-all solution. If I add an argument to mm_init() to indicate if this context is for fork() or execve() then the ELF header flag approach works. So you don't need the sysctl? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM
On 04/23/2015 02:08 PM, Christian Borntraeger wrote: Am 23.04.2015 um 14:01 schrieb Alexander Graf: Am 23.04.2015 um 13:43 schrieb Christian Borntraeger : Am 23.04.2015 um 13:37 schrieb Alexander Graf: Am 23.04.2015 um 13:08 schrieb Christian Borntraeger : From: Martin Schwidefsky Replacing a 2K page table with a 4K page table while a VMA is active for the affected memory region is fundamentally broken. Rip out the page table reallocation code and replace it with a simple system control 'vm.allocate_pgste'. If the system control is set the page tables for all processes are allocated as full 4K pages, even for processes that do not need it. Signed-off-by: Martin Schwidefsky Signed-off-by: Christian Borntraeger Couldn't you make this a hidden kconfig option that gets automatically selected when kvm is enabled? Or is there a non-kvm case that needs it too? For things like RHEV the default could certainly be "enabled", but for normal distros like SLES/RHEL, the idea was to NOT enable that by default, as the non-KVM case is more common and might suffer from the additional memory consumption of the page tables. (big databases come to mind) We could think about having rpms like kvm to provide a sysctl file that sets it if we want to minimize the impact. Other ideas? Oh, I'm sorry, I misread the ifdef. I don't think it makes sense to have a config option for the default value then, just rely only on sysctl.conf for changed defaults. As far as mechanisms to change it go, every distribution has their own ways of dealing with this. RH has a "profile" thing, we don't really have anything central, but individual sysctl.d files for example that a kvm package could provide. Either way, the default choosing shouldn't happen in .config ;). So you vote for getting rid of the Kconfig? Also, please add some helpful error message in qemu to guide users to the sysctl. Yes, we will provide a qemu patch (cc stable) after this hits the kernel. As far as alternative approaches go, I don't have a great idea otoh. We could have an elf flag indicating that this process needs 4k page tables to limit the impact to a single process. This approach was actually Martins first fix. The problem is that the decision takes place on execve, but we need an answer at fork time. So we always started with 4k page tables and freed the 2nd halv on execve. Now this did not work for processes that only fork (without execve). In fact, could we maybe still limit the scope to non-global? A personality may work as well. Or ulimit? I think we will go for now with the sysctl and see if we can come up with some automatic way as additional patch later on. Sounds perfectly reasonable to me. You can for example also just set the sysctl bit in libvirtd :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM
On 04/23/2015 02:13 PM, Martin Schwidefsky wrote: On Thu, 23 Apr 2015 14:01:23 +0200 Alexander Graf wrote: As far as alternative approaches go, I don't have a great idea otoh. We could have an elf flag indicating that this process needs 4k page tables to limit the impact to a single process. In fact, could we maybe still limit the scope to non-global? A personality may work as well. Or ulimit? I tried the ELF flag approach, does not work. The trouble is that allocate_mm() has to create the page tables with 4K tables if you want to change the page table layout later on. We have learned the hard way that the direction 2K to 4K does not work due to races in the mm. Now there are two major cases: 1) fork + execve and 2) fork only. The ELF flag can be used to reduce from 4K to 2K for 1) but not 2). 2) is required for apps that use lots of forking, e.g. database or web servers. Same goes for the approach with a personality flag or ulimit. We would have to distinguish the two cases for allocate_mm(), if the new mm is allocated for a fork the current mm decides 2K vs. 4K. If the new mm is allocated by binfmt_elf, then start with 4K and do the downgrade after the ELF flag has been evaluated. Well, you could also make it a personality flag for example, no? Then every new process below a certain one always gets 4k page tables until they drop the personality, at which point each child would only get 2k page tables again. I'm mostly concerned that people will end up mixing VMs and other workloads on the same LPAR, so I don't think there's a one-shoe-fits-all solution. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM
> Am 23.04.2015 um 13:43 schrieb Christian Borntraeger : > >> Am 23.04.2015 um 13:37 schrieb Alexander Graf: >> >> >>> Am 23.04.2015 um 13:08 schrieb Christian Borntraeger >>> : >>> >>> From: Martin Schwidefsky >>> >>> Replacing a 2K page table with a 4K page table while a VMA is active >>> for the affected memory region is fundamentally broken. Rip out the >>> page table reallocation code and replace it with a simple system >>> control 'vm.allocate_pgste'. If the system control is set the page >>> tables for all processes are allocated as full 4K pages, even for >>> processes that do not need it. >>> >>> Signed-off-by: Martin Schwidefsky >>> Signed-off-by: Christian Borntraeger >> >> Couldn't you make this a hidden kconfig option that gets automatically >> selected when kvm is enabled? Or is there a non-kvm case that needs it too? > > For things like RHEV the default could certainly be "enabled", but for normal > distros like SLES/RHEL, the idea was to NOT enable that by default, as the > non-KVM > case is more common and might suffer from the additional memory consumption of > the page tables. (big databases come to mind) > > We could think about having rpms like kvm to provide a sysctl file that sets > it if we > want to minimize the impact. Other ideas? Oh, I'm sorry, I misread the ifdef. I don't think it makes sense to have a config option for the default value then, just rely only on sysctl.conf for changed defaults. As far as mechanisms to change it go, every distribution has their own ways of dealing with this. RH has a "profile" thing, we don't really have anything central, but individual sysctl.d files for example that a kvm package could provide. Either way, the default choosing shouldn't happen in .config ;). Also, please add some helpful error message in qemu to guide users to the sysctl. As far as alternative approaches go, I don't have a great idea otoh. We could have an elf flag indicating that this process needs 4k page tables to limit the impact to a single process. In fact, could we maybe still limit the scope to non-global? A personality may work as well. Or ulimit? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM
> Am 23.04.2015 um 13:08 schrieb Christian Borntraeger : > > From: Martin Schwidefsky > > Replacing a 2K page table with a 4K page table while a VMA is active > for the affected memory region is fundamentally broken. Rip out the > page table reallocation code and replace it with a simple system > control 'vm.allocate_pgste'. If the system control is set the page > tables for all processes are allocated as full 4K pages, even for > processes that do not need it. > > Signed-off-by: Martin Schwidefsky > Signed-off-by: Christian Borntraeger Couldn't you make this a hidden kconfig option that gets automatically selected when kvm is enabled? Or is there a non-kvm case that needs it too? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 09/21] KVM: PPC: Book3S HV: Add ICP real mode counters
From: Suresh Warrier Add two counters to count how often we generate real-mode ICS resend and reject events. The counters provide some performance statistics that could be used in the future to consider if the real mode functions need further optimizing. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Also added two counters that count (approximately) how many times we don't find an ICP or ICS we're looking for. These are not currently exposed through sysfs, but can be useful when debugging crashes. Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 7 +++ arch/powerpc/kvm/book3s_xics.c | 10 -- arch/powerpc/kvm/book3s_xics.h | 5 + 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 73bbe92..6dded8c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, ics = kvmppc_xics_find_ics(xics, new_irq, &src); if (!ics) { /* Unsafe increment, but this does not need to be accurate */ + xics->err_noics++; return; } state = &ics->irq_state[src]; @@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, icp = kvmppc_xics_find_server(xics->kvm, state->server); if (!icp) { /* Unsafe increment again*/ + xics->err_noicp++; goto out; } } @@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * separately here as well. */ if (resend) { + icp->n_check_resend++; icp_rm_check_resend(xics, icp); } } @@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server, /* Handle reject in real mode */ if (reject && reject != XICS_IPI) { + this_icp->n_reject++; icp_rm_deliver_irq(xics, icp, reject); } /* Handle resends in real mode */ if (resend) { + this_icp->n_check_resend++; icp_rm_check_resend(xics, icp); } @@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr) * attempt (see comments in icp_rm_deliver_irq). */ if (reject && reject != XICS_IPI) { + icp->n_reject++; icp_rm_deliver_irq(xics, icp, reject); } bail: @@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr) /* Still asserted, resend it */ if (state->asserted) { + icp->n_reject++; icp_rm_deliver_irq(xics, icp, irq); } diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 5f7beebd..8f3e6cc 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void *private) unsigned long flags; unsigned long t_rm_kick_vcpu, t_rm_check_resend; unsigned long t_rm_reject, t_rm_notify_eoi; + unsigned long t_reject, t_check_resend; if (!kvm) return 0; @@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void *private) t_rm_notify_eoi = 0; t_rm_check_resend = 0; t_rm_reject = 0; + t_check_resend = 0; + t_reject = 0; seq_printf(m, "=\nICP state\n=\n"); @@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void *private) t_rm_notify_eoi += icp->n_rm_notify_eoi; t_rm_check_resend += icp->n_rm_check_resend; t_rm_reject += icp->n_rm_reject; + t_check_resend += icp->n_check_resend; + t_reject += icp->n_reject; } - seq_puts(m, "ICP Guest Real Mode exit totals: "); - seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n", + seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n", t_rm_kick_vcpu, t_rm_check_resend, t_rm_reject, t_rm_notify_eoi); + seq_printf(m, "ICP Real Mode totals: check_resend=%lu resend=%lu\n", + t_check_resend, t_reject); for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) { struct kvmppc_ics *ics = xics->ics[icsid]; diff --g
[PULL 07/21] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
From: Suresh Warrier Replaces the ICS mutex lock with a spin lock since we will be porting these routines to real mode. Note that we need to disable interrupts before we take the lock in anticipation of the fact that on the guest side, we are running in the context of a hard irq and interrupts are disabled (EE bit off) when the lock is acquired. Again, because we will be acquiring the lock in hypervisor real mode, we need to use an arch_spinlock_t instead of a normal spinlock here as we want to avoid running any lockdep code (which may not be safe to execute in real mode). Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_xics.c | 68 +- arch/powerpc/kvm/book3s_xics.h | 2 +- 2 files changed, 48 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 60bdbac..5f7beebd 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -39,7 +40,7 @@ * LOCKING * === * - * Each ICS has a mutex protecting the information about the IRQ + * Each ICS has a spin lock protecting the information about the IRQ * sources and avoiding simultaneous deliveries if the same interrupt. * * ICP operations are done via a single compare & swap transaction @@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, { int i; - mutex_lock(&ics->lock); + unsigned long flags; + + local_irq_save(flags); + arch_spin_lock(&ics->lock); for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { struct ics_irq_state *state = &ics->irq_state[i]; @@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, XICS_DBG("resend %#x prio %#x\n", state->number, state->priority); - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); icp_deliver_irq(xics, icp, state->number); - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); } - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); } static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, @@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, u32 server, u32 priority, u32 saved_priority) { bool deliver; + unsigned long flags; - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); state->server = server; state->priority = priority; @@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics, deliver = true; } - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); return deliver; } @@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority) struct kvmppc_ics *ics; struct ics_irq_state *state; u16 src; + unsigned long flags; if (!xics) return -ENODEV; @@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority) return -EINVAL; state = &ics->irq_state[src]; - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); *server = state->server; *priority = state->priority; - mutex_unlock(&ics->lock); + arch_spin_unlock(&ics->lock); + local_irq_restore(flags); return 0; } @@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, struct kvmppc_ics *ics; u32 reject; u16 src; + unsigned long flags; /* * This is used both for initial delivery of an interrupt and @@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, state = &ics->irq_state[src]; /* Get a lock on the ICS */ - mutex_lock(&ics->lock); + local_irq_save(flags); + arch_spin_lock(&ics->lock); /* Get our server */ if (!icp || state->server != icp->server_num) { @@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, * * Note that if successful, the new delivery might have itself * rejected an interrupt th
[PULL 04/21] KVM: PPC: Book3S HV: Remove RMA-related variables from code
From: "Aneesh Kumar K.V" We don't support real-mode areas now that 970 support is removed. Remove the remaining details of rma from the code. Also rename rma_setup_done to hpte_setup_done to better reflect the changes. Signed-off-by: Aneesh Kumar K.V Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 3 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++-- arch/powerpc/kvm/book3s_hv.c| 10 +- 3 files changed, 20 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 8ef0512..015773f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -228,9 +228,8 @@ struct kvm_arch { int tlbie_lock; unsigned long lpcr; unsigned long rmor; - struct kvm_rma_info *rma; unsigned long vrma_slb_v; - int rma_setup_done; + int hpte_setup_done; u32 hpt_order; atomic_t vcpus_running; u32 online_vcores; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3..dbf1271 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) long order; mutex_lock(&kvm->lock); - if (kvm->arch.rma_setup_done) { - kvm->arch.rma_setup_done = 0; - /* order rma_setup_done vs. vcpus_running */ + if (kvm->arch.hpte_setup_done) { + kvm->arch.hpte_setup_done = 0; + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(&kvm->arch.vcpus_running)) { - kvm->arch.rma_setup_done = 1; + kvm->arch.hpte_setup_done = 1; goto out; } } @@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, unsigned long tmp[2]; ssize_t nb; long int err, ret; - int rma_setup; + int hpte_setup; if (!access_ok(VERIFY_READ, buf, count)) return -EFAULT; /* lock out vcpus from running while we're doing this */ mutex_lock(&kvm->lock); - rma_setup = kvm->arch.rma_setup_done; - if (rma_setup) { - kvm->arch.rma_setup_done = 0; /* temporarily */ - /* order rma_setup_done vs. vcpus_running */ + hpte_setup = kvm->arch.hpte_setup_done; + if (hpte_setup) { + kvm->arch.hpte_setup_done = 0; /* temporarily */ + /* order hpte_setup_done vs. vcpus_running */ smp_mb(); if (atomic_read(&kvm->arch.vcpus_running)) { - kvm->arch.rma_setup_done = 1; + kvm->arch.hpte_setup_done = 1; mutex_unlock(&kvm->lock); return -EBUSY; } @@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, "r=%lx\n", ret, i, v, r); goto out; } - if (!rma_setup && is_vrma_hpte(v)) { + if (!hpte_setup && is_vrma_hpte(v)) { unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; @@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, (VRMA_VSID << SLB_VSID_SHIFT_1T); lpcr = senc << (LPCR_VRMASD_SH - 4); kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD); - rma_setup = 1; + hpte_setup = 1; } ++i; hptp += 2; @@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, } out: - /* Order HPTE updates vs. rma_setup_done */ + /* Order HPTE updates vs. hpte_setup_done */ smp_wmb(); - kvm->arch.rma_setup_done = rma_setup; + kvm->arch.hpte_setup_done = hpte_setup; mutex_unlock(&kvm->lock); if (err) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b9c11a3..dde14fd 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2044,11 +2044,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) } atomic_inc(&vcpu->kvm->
[PULL 02/21] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
From: David Gibson On POWER, storage caching is usually configured via the MMU - attributes such as cache-inhibited are stored in the TLB and the hashed page table. This makes correctly performing cache inhibited IO accesses awkward when the MMU is turned off (real mode). Some CPU models provide special registers to control the cache attributes of real mode load and stores but this is not at all consistent. This is a problem in particular for SLOF, the firmware used on KVM guests, which runs entirely in real mode, but which needs to do IO to load the kernel. To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to a logical address (aka guest physical address). SLOF uses these for IO. However, because these are implemented within qemu, not the host kernel, these bypass any IO devices emulated within KVM itself. The simplest way to see this problem is to attempt to boot a KVM guest from a virtio-blk device with iothread / dataplane enabled. The iothread code relies on an in kernel implementation of the virtio queue notification, which is not triggered by the IO hcalls, and so the guest will stall in SLOF unable to load the guest OS. This patch addresses this by providing in-kernel implementations of the 2 hypercalls, which correctly scan the KVM IO bus. Any access to an address not handled by the KVM IO bus will cause a VM exit, hitting the qemu implementation as before. Note that a userspace change is also required, in order to enable these new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. Signed-off-by: David Gibson [agraf: fix compilation] Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 3 ++ arch/powerpc/kvm/book3s.c | 76 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_pr_papr.c | 28 + 4 files changed, 119 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 942c7b1..578e550 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -292,6 +292,9 @@ static inline bool kvmppc_supports_magic_page(struct kvm_vcpu *vcpu) return !is_kvmppc_hv_enabled(vcpu->kvm); } +extern int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu); +extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu); + /* Magic register values loaded into r3 and r4 before the 'sc' assembly * instruction for the OSI hypercalls */ #define OSI_SC_MAGIC_R30x113724FA diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index cfbcdc6..453a8a4 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm) #endif } +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu) +{ + unsigned long size = kvmppc_get_gpr(vcpu, 4); + unsigned long addr = kvmppc_get_gpr(vcpu, 5); + u64 buf; + int ret; + + if (!is_power_of_2(size) || (size > sizeof(buf))) + return H_TOO_HARD; + + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf); + if (ret != 0) + return H_TOO_HARD; + + switch (size) { + case 1: + kvmppc_set_gpr(vcpu, 4, *(u8 *)&buf); + break; + + case 2: + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)&buf)); + break; + + case 4: + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)&buf)); + break; + + case 8: + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)&buf)); + break; + + default: + BUG(); + } + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load); + +int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu) +{ + unsigned long size = kvmppc_get_gpr(vcpu, 4); + unsigned long addr = kvmppc_get_gpr(vcpu, 5); + unsigned long val = kvmppc_get_gpr(vcpu, 6); + u64 buf; + int ret; + + switch (size) { + case 1: + *(u8 *)&buf = val; + break; + + case 2: + *(__be16 *)&buf = cpu_to_be16(val); + break; + + case 4: + *(__be32 *)&buf = cpu_to_be32(val); + break; + + case 8: + *(__be64 *)&buf = cpu_to_be64(val); + break; + + default: + return H_TOO_HARD; + } + + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, &buf); + if (ret != 0) + return H_TOO_HARD; + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store); + int kvmppc_core_check_processor_compat(void) { /* diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index de74756.
[PULL 14/21] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
From: Paul Mackerras Rather than calling cond_resched() in kvmppc_run_core() before doing the post-processing for the vcpus that we have just run (that is, calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do that post-processing before calling cond_resched(), and that post- processing is moved out into its own function, post_guest_process(). The reschedule point is now in kvmppc_run_vcpu() and we define a new vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner task is runnable but not running. (Doing the reschedule with the vcore in VCORE_INACTIVE state would be bad because there are potentially other vcpus waiting for the runner in kvmppc_wait_for_exec() which then wouldn't get woken up.) Also, we make use of the handy cond_resched_lock() function, which unlocks and relocks vc->lock for us around the reschedule. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 5 +- arch/powerpc/kvm/book3s_hv.c| 92 + 2 files changed, 55 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3eecd88..83c4425 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -304,8 +304,9 @@ struct kvmppc_vcore { /* Values for vcore_state */ #define VCORE_INACTIVE 0 #define VCORE_SLEEPING 1 -#define VCORE_RUNNING 2 -#define VCORE_EXITING 3 +#define VCORE_PREEMPT 2 +#define VCORE_RUNNING 3 +#define VCORE_EXITING 4 /* * Struct used to manage memory for a virtual processor area diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b38c10e..fb4f166 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1882,15 +1882,50 @@ static void prepare_threads(struct kvmppc_vcore *vc) } } +static void post_guest_process(struct kvmppc_vcore *vc) +{ + u64 now; + long ret; + struct kvm_vcpu *vcpu, *vnext; + + now = get_tb(); + list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, +arch.run_list) { + /* cancel pending dec exception if dec is positive */ + if (now < vcpu->arch.dec_expires && + kvmppc_core_pending_dec(vcpu)) + kvmppc_core_dequeue_dec(vcpu); + + trace_kvm_guest_exit(vcpu); + + ret = RESUME_GUEST; + if (vcpu->arch.trap) + ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu, + vcpu->arch.run_task); + + vcpu->arch.ret = ret; + vcpu->arch.trap = 0; + + if (vcpu->arch.ceded) { + if (!is_kvmppc_resume_guest(ret)) + kvmppc_end_cede(vcpu); + else + kvmppc_set_timer(vcpu); + } + if (!is_kvmppc_resume_guest(vcpu->arch.ret)) { + kvmppc_remove_runnable(vc, vcpu); + wake_up(&vcpu->arch.cpu_run); + } + } +} + /* * Run a set of guest threads on a physical core. * Called with vc->lock held. */ static void kvmppc_run_core(struct kvmppc_vcore *vc) { - struct kvm_vcpu *vcpu, *vnext; - long ret; - u64 now; + struct kvm_vcpu *vcpu; int i; int srcu_idx; @@ -1922,8 +1957,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) */ if ((threads_per_core > 1) && ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) { - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) + list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { vcpu->arch.ret = -EBUSY; + kvmppc_remove_runnable(vc, vcpu); + wake_up(&vcpu->arch.cpu_run); + } goto out; } @@ -1979,44 +2017,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) kvm_guest_exit(); preempt_enable(); - cond_resched(); spin_lock(&vc->lock); - now = get_tb(); - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { - /* cancel pending dec exception if dec is positive */ - if (now < vcpu->arch.dec_expires && - kvmppc_core_pending_dec(vcpu)) - kvmppc_core_dequeue_dec(vcpu); - - trace_kvm_guest_exit(vcpu); - - ret = RESUME_GUEST; - if (vcpu->arch.trap) - ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu, -
[PULL 20/21] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
From: Paul Mackerras This replaces the assembler code for kvmhv_commence_exit() with C code in book3s_hv_builtin.c. It also moves the IPI sending code that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq(). Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 + arch/powerpc/kvm/book3s_hv_builtin.c | 63 ++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 12 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 66 4 files changed, 75 insertions(+), 68 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 869c53f..2b84e48 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm) extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); +extern void kvmhv_rm_send_ipi(int cpu); + #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 2754251..c42aa55 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -22,6 +22,7 @@ #include #include #include +#include #define KVM_CMA_CHUNK_ORDER18 @@ -184,3 +185,65 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu) return H_HARDWARE; } + +static inline void rm_writeb(unsigned long paddr, u8 val) +{ + __asm__ __volatile__("stbcix %0,0,%1" + : : "r" (val), "r" (paddr) : "memory"); +} + +/* + * Send an interrupt to another CPU. + * This can only be called in real mode. + * The caller needs to include any barrier needed to order writes + * to memory vs. the IPI/message. + */ +void kvmhv_rm_send_ipi(int cpu) +{ + unsigned long xics_phys; + + /* Poke the target */ + xics_phys = paca[cpu].kvm_hstate.xics_phys; + rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY); +} + +/* + * The following functions are called from the assembly code + * in book3s_hv_rmhandlers.S. + */ +static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active) +{ + int cpu = vc->pcpu; + + /* Order setting of exit map vs. msgsnd/IPI */ + smp_mb(); + for (; active; active >>= 1, ++cpu) + if (active & 1) + kvmhv_rm_send_ipi(cpu); +} + +void kvmhv_commence_exit(int trap) +{ + struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore; + int ptid = local_paca->kvm_hstate.ptid; + int me, ee; + + /* Set our bit in the threads-exiting-guest map in the 0xff00 + bits of vcore->entry_exit_map */ + me = 0x100 << ptid; + do { + ee = vc->entry_exit_map; + } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee); + + /* Are we the first here? */ + if ((ee >> 8) != 0) + return; + + /* +* Trigger the other threads in this vcore to exit the guest. +* If this is a hypervisor decrementer interrupt then they +* will be already on their way out of the guest. +*/ + if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER) + kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid)); +} diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 6dded8c..00e45b6 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -26,12 +26,6 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, u32 new_irq); -static inline void rm_writeb(unsigned long paddr, u8 val) -{ - __asm__ __volatile__("sync; stbcix %0,0,%1" - : : "r" (val), "r" (paddr) : "memory"); -} - /* -- ICS routines -- */ static void ics_rm_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics, struct kvmppc_icp *icp) @@ -60,7 +54,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, struct kvm_vcpu *this_vcpu) { struct kvmppc_icp *this_icp = this_vcpu->arch.icp; - unsigned long xics_phys; int cpu; /* Mark the target VCPU as having an interrupt pending */ @@ -83,9 +76,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, /* In SMT cpu will always point to thread 0, we adjust it */ cpu += vcpu->arch.ptid; - /* Not too hard, then poke the target */ - xics_phys = paca[cpu].kvm_hstate.xics_phys; - rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY); + smp_mb(); + kvmhv_rm_send_ipi(cpu); } static void icp_rm_clr_v
[PULL 13/21] KVM: PPC: Book3S HV: Minor cleanups
From: Paul Mackerras * Remove unused kvmppc_vcore::n_busy field. * Remove setting of RMOR, since it was only used on PPC970 and the PPC970 KVM support has been removed. * Don't use r1 or r2 in setting the runlatch since they are conventionally reserved for other things; use r0 instead. * Streamline the code a little and remove the ext_interrupt_to_host label. * Add some comments about register usage. * hcall_try_real_mode doesn't need to be global, and can't be called from C code anyway. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 1 - arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++--- 3 files changed, 19 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2f339ff..3eecd88 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -227,7 +227,6 @@ struct kvm_arch { unsigned long host_sdr1; int tlbie_lock; unsigned long lpcr; - unsigned long rmor; unsigned long vrma_slb_v; int hpte_setup_done; u32 hpt_order; @@ -271,7 +270,6 @@ struct kvm_arch { */ struct kvmppc_vcore { int n_runnable; - int n_busy; int num_threads; int entry_exit_count; int n_woken; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 3fea721..92ec3fc 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -505,7 +505,6 @@ int main(void) DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits)); DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls)); DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr)); - DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor)); DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v)); DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr)); DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b06fe53..f8267e5 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -245,9 +245,9 @@ kvm_novcpu_exit: kvm_start_guest: /* Set runlatch bit the minute you wake up from nap */ - mfspr r1, SPRN_CTRLF - ori r1, r1, 1 - mtspr SPRN_CTRLT, r1 + mfspr r0, SPRN_CTRLF + ori r0, r0, 1 + mtspr SPRN_CTRLT, r0 ld r2,PACATOC(r13) @@ -493,11 +493,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) cmpwi r0,0 beq 20b - /* Set LPCR and RMOR. */ + /* Set LPCR. */ 10:ld r8,VCORE_LPCR(r5) mtspr SPRN_LPCR,r8 - ld r8,KVM_RMOR(r9) - mtspr SPRN_RMOR,r8 isync /* Check if HDEC expires soon */ @@ -1075,7 +1073,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) bne 2f mfspr r3,SPRN_HDEC cmpwi r3,0 - bge ignore_hdec + mr r4,r9 + bge fast_guest_return 2: /* See if this is an hcall we can handle in real mode */ cmpwi r12,BOOK3S_INTERRUPT_SYSCALL @@ -1083,26 +1082,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* External interrupt ? */ cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL - bne+ext_interrupt_to_host + bne+guest_exit_cont /* External interrupt, first check for host_ipi. If this is * set, we know the host wants us out so let's do it now */ bl kvmppc_read_intr cmpdi r3, 0 - bgt ext_interrupt_to_host + bgt guest_exit_cont /* Check if any CPU is heading out to the host, if so head out too */ ld r5, HSTATE_KVM_VCORE(r13) lwz r0, VCORE_ENTRY_EXIT(r5) cmpwi r0, 0x100 - bge ext_interrupt_to_host - - /* Return to guest after delivering any pending interrupt */ mr r4, r9 - b deliver_guest_interrupt - -ext_interrupt_to_host: + blt deliver_guest_interrupt guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */ /* Save more register state */ @@ -1763,8 +1757,10 @@ kvmppc_hisi: * Returns to the guest if we handle it, or continues on up to * the kernel if we can't (i.e. if we don't have a handler for * it, or if the handler returns H_TOO_HARD). + * + * r5 - r8 contain hcall args, + * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca */ - .globl hcall_try_real_mode hcall_try_real_mode: ld r3,VCPU_GPR(R3)(r9) andi. r0,r11,MSR_PR @@ -2024,10 +2020,6 @@ hcall_real_table: .globl hcall_real_table_end hcall_real_table_end: -ignore_hdec: -
[PULL 15/21] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
From: Paul Mackerras We can tell when a secondary thread has finished running a guest by the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there is no real need for the nap_count field in the kvmppc_vcore struct. This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu pointers of the secondary threads rather than polling vc->nap_count. Besides reducing the size of the kvmppc_vcore struct by 8 bytes, this also means that we can tell which secondary threads have got stuck and thus print a more informative error message. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 1 - arch/powerpc/kvm/book3s_hv.c| 47 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 + 4 files changed, 34 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 83c4425..1517faa 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -272,8 +272,6 @@ struct kvmppc_vcore { int n_runnable; int num_threads; int entry_exit_count; - int n_woken; - int nap_count; int napping_threads; int first_vcpuid; u16 pcpu; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 92ec3fc..8aa8246 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -563,7 +563,6 @@ int main(void) DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort)); DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1)); DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count)); - DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count)); DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest)); DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads)); DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index fb4f166..7c1335d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1729,8 +1729,10 @@ static int kvmppc_grab_hwthread(int cpu) tpaca = &paca[cpu]; /* Ensure the thread won't go into the kernel if it wakes */ - tpaca->kvm_hstate.hwthread_req = 1; tpaca->kvm_hstate.kvm_vcpu = NULL; + tpaca->kvm_hstate.napping = 0; + smp_wmb(); + tpaca->kvm_hstate.hwthread_req = 1; /* * If the thread is already executing in the kernel (e.g. handling @@ -1773,35 +1775,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) } cpu = vc->pcpu + vcpu->arch.ptid; tpaca = &paca[cpu]; - tpaca->kvm_hstate.kvm_vcpu = vcpu; tpaca->kvm_hstate.kvm_vcore = vc; tpaca->kvm_hstate.ptid = vcpu->arch.ptid; vcpu->cpu = vc->pcpu; + /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */ smp_wmb(); + tpaca->kvm_hstate.kvm_vcpu = vcpu; #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) - if (cpu != smp_processor_id()) { + if (cpu != smp_processor_id()) xics_wake_cpu(cpu); - if (vcpu->arch.ptid) - ++vc->n_woken; - } #endif } -static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc) +static void kvmppc_wait_for_nap(void) { - int i; + int cpu = smp_processor_id(); + int i, loops; - HMT_low(); - i = 0; - while (vc->nap_count < vc->n_woken) { - if (++i >= 100) { - pr_err("kvmppc_wait_for_nap timeout %d %d\n", - vc->nap_count, vc->n_woken); - break; + for (loops = 0; loops < 100; ++loops) { + /* +* Check if all threads are finished. +* We set the vcpu pointer when starting a thread +* and the thread clears it when finished, so we look +* for any threads that still have a non-NULL vcpu ptr. +*/ + for (i = 1; i < threads_per_subcore; ++i) + if (paca[cpu + i].kvm_hstate.kvm_vcpu) + break; + if (i == threads_per_subcore) { + HMT_medium(); + return; } - cpu_relax(); + HMT_low(); } HMT_medium(); + for (i = 1; i < threads_per_subcore; ++i) + if (paca[cpu + i].kvm_hstate.kvm_vcpu) + pr_err("KVM: CPU %d seems to be stuck\n", cpu + i); } /* @@ -1942,8 +1952,6 @@ static void kvmppc_run
[PULL 00/21] ppc patch queue 2015-04-21 for 4.1
Hi Paolo / Marcelo, This is my current patch queue for ppc. Please pull. Alex The following changes since commit b79013b2449c23f1f505bdf39c5a6c330338b244: Merge tag 'staging-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging (2015-04-13 17:37:33 -0700) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-queue for you to fetch changes up to 66feed61cdf6ee65fd551d3460b1efba6bee55b8: KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 (2015-04-21 15:21:34 +0200) Patch queue for ppc - 2015-04-21 This is the latest queue for KVM on PowerPC changes. Highlights this time around: - Book3S HV: Debugging aids - Book3S HV: Minor performance improvements - Book3S HV: Cleanups Aneesh Kumar K.V (2): KVM: PPC: Book3S HV: Remove RMA-related variables from code KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte David Gibson (1): kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM Michael Ellerman (1): KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. Paul Mackerras (12): KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 Suresh E. Warrier (2): powerpc: Export __spin_yield KVM: PPC: Book3S HV: Add guest->host real mode completion counters Suresh Warrier (3): KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Add ICP real mode counters Documentation/virtual/kvm/api.txt| 17 + arch/powerpc/include/asm/archrandom.h| 11 +- arch/powerpc/include/asm/kvm_book3s.h| 3 + arch/powerpc/include/asm/kvm_book3s_64.h | 18 + arch/powerpc/include/asm/kvm_host.h | 47 ++- arch/powerpc/include/asm/kvm_ppc.h | 2 + arch/powerpc/include/asm/time.h | 3 + arch/powerpc/kernel/asm-offsets.c| 20 +- arch/powerpc/kernel/time.c | 6 + arch/powerpc/kvm/Kconfig | 14 + arch/powerpc/kvm/book3s.c| 76 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 189 +-- arch/powerpc/kvm/book3s_hv.c | 435 ++-- arch/powerpc/kvm/book3s_hv_builtin.c | 100 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 +- arch/powerpc/kvm/book3s_hv_rm_xics.c | 238 +++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 559 +++ arch/powerpc/kvm/book3s_pr_papr.c| 28 ++ arch/powerpc/kvm/book3s_xics.c | 105 -- arch/powerpc/kvm/book3s_xics.h | 13 +- arch/powerpc/kvm/powerpc.c | 3 + arch/powerpc/lib/locks.c | 1 + arch/powerpc/platforms/powernv/rng.c | 29 ++ include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 1 + 25 files changed, 1580 insertions(+), 364 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 16/21] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
From: Paul Mackerras When running a multi-threaded guest and vcpu 0 in a virtual core is not running in the guest (i.e. it is busy elsewhere in the host), thread 0 of the physical core will switch the MMU to the guest and then go to nap mode in the code at kvm_do_nap. If the guest sends an IPI to thread 0 using the msgsndp instruction, that will wake up thread 0 and cause all the threads in the guest to exit to the host unnecessarily. To avoid the unnecessary exit, this arranges for the PECEDP bit to be cleared in this situation. When napping due to a H_CEDE from the guest, we still set PECEDP so that the thread will wake up on an IPI sent using msgsndp. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 6716db3..12d7e4c 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -191,6 +191,7 @@ kvmppc_primary_no_guest: li r3, NAPPING_NOVCPU stb r3, HSTATE_NAPPING(r13) + li r3, 0 /* Don't wake on privileged (OS) doorbell */ b kvm_do_nap kvm_novcpu_wakeup: @@ -2129,10 +2130,13 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */ bl kvmhv_accumulate_time #endif + lis r3, LPCR_PECEDP@h /* Do wake on privileged doorbell */ + /* * Take a nap until a decrementer or external or doobell interrupt -* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the -* runlatch bit before napping. +* occurs, with PECE1 and PECE0 set in LPCR. +* On POWER8, if we are ceding, also set PECEDP. +* Also clear the runlatch bit before napping. */ kvm_do_nap: mfspr r0, SPRN_CTRLF @@ -2144,7 +2148,7 @@ kvm_do_nap: mfspr r5,SPRN_LPCR ori r5,r5,LPCR_PECE0 | LPCR_PECE1 BEGIN_FTR_SECTION - orisr5,r5,LPCR_PECEDP@h + rlwimi r5, r3, 0, LPCR_PECEDP END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtspr SPRN_LPCR,r5 isync -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 19/21] KVM: PPC: Book3S HV: Streamline guest entry and exit
From: Paul Mackerras On entry to the guest, secondary threads now wait for the primary to switch the MMU after loading up most of their state, rather than before. This means that the secondary threads get into the guest sooner, in the common case where the secondary threads get to kvmppc_hv_entry before the primary thread. On exit, the first thread out increments the exit count and interrupts the other threads (to get them out of the guest) before saving most of its state, rather than after. That means that the other threads exit sooner and means that the first thread doesn't spend so much time waiting for the other threads at the point where the MMU gets switched back to the host. This pulls out the code that increments the exit count and interrupts other threads into a separate function, kvmhv_commence_exit(). This also makes sure that r12 and vcpu->arch.trap are set correctly in some corner cases. Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the improvement. Aggregating across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, gives this before the change: rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples) rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples) rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples) and this after the change: rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples) rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples) rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples) showing a substantial reduction in the time spent per guest entry in the real-mode guest entry code, and smaller reductions in the real mode guest exit and interrupt handling times. (The test was to start the guest and boot Fedora 20 big-endian to the login prompt.) Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 212 +++- 1 file changed, 126 insertions(+), 86 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 245f5c9..3f6fd78 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -175,6 +175,19 @@ kvmppc_primary_no_guest: /* put the HDEC into the DEC, since HDEC interrupts don't wake us */ mfspr r3, SPRN_HDEC mtspr SPRN_DEC, r3 + /* +* Make sure the primary has finished the MMU switch. +* We should never get here on a secondary thread, but +* check it for robustness' sake. +*/ + ld r5, HSTATE_KVM_VCORE(r13) +65:lbz r0, VCORE_IN_GUEST(r5) + cmpwi r0, 0 + beq 65b + /* Set LPCR. */ + ld r8,VCORE_LPCR(r5) + mtspr SPRN_LPCR,r8 + isync /* set our bit in napping_threads */ ld r5, HSTATE_KVM_VCORE(r13) lbz r7, HSTATE_PTID(r13) @@ -206,7 +219,7 @@ kvm_novcpu_wakeup: /* check the wake reason */ bl kvmppc_check_wake_reason - + /* see if any other thread is already exiting */ lwz r0, VCORE_ENTRY_EXIT(r5) cmpwi r0, 0x100 @@ -244,7 +257,15 @@ kvm_novcpu_wakeup: b kvmppc_got_guest kvm_novcpu_exit: - b hdec_soon +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING + ld r4, HSTATE_KVM_VCPU(r13) + cmpdi r4, 0 + beq 13f + addir3, r4, VCPU_TB_RMEXIT + bl kvmhv_accumulate_time +#endif +13:bl kvmhv_commence_exit + b kvmhv_switch_to_host /* * We come in here when wakened from nap mode. @@ -422,7 +443,7 @@ kvmppc_hv_entry: /* Primary thread switches to guest partition. */ ld r9,VCORE_KVM(r5)/* pointer to struct kvm */ cmpwi r6,0 - bne 20f + bne 10f ld r6,KVM_SDR1(r9) lwz r7,KVM_LPID(r9) li r0,LPID_RSVD/* switch to reserved LPID */ @@ -493,26 +514,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) li r0,1 stb r0,VCORE_IN_GUEST(r5) /* signal secondaries to continue */ - b 10f - - /* Secondary threads wait for primary to have done partition switch */ -20:lbz r0,VCORE_IN_GUEST(r5) - cmpwi r0,0 - beq 20b - - /* Set LPCR. */ -10:ld r8,VCORE_LPCR(r5) - mtspr SPRN_LPCR,r8 - isync - - /* Check if HDEC expires soon */ - mfspr r3,SPRN_HDEC - cmpwi r3,512 /* 1 microsecond */ - li r12,BOOK3S_INTERRUPT_HV_DECREMENTER - blt hdec_soon /* Do we have a guest vcpu to run? */ - cmpdi r4, 0 +10:cmpdi r4, 0 beq kvmppc_primary_no_guest kvmppc_got_guest: @@ -837,6 +841,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) clrrdi r6,r6,1 mtspr SPRN_CTRLT,r6 4: + /* Secondary threads wait for primary to have done partition switch */
[PULL 17/21] KVM: PPC: Book3S HV: Use decrementer to wake napping threads
From: Paul Mackerras This arranges for threads that are napping due to their vcpu having ceded or due to not having a vcpu to wake up at the end of the guest's timeslice without having to be poked with an IPI. We do that by arranging for the decrementer to contain a value no greater than the number of timebase ticks remaining until the end of the timeslice. In the case of a thread with no vcpu, this number is in the hypervisor decrementer already. In the case of a ceded vcpu, we use the smaller of the HDEC value and the DEC value. Using the DEC like this when ceded means we need to save and restore the guest decrementer value around the nap. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 12d7e4c..16719af 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) kvmppc_primary_no_guest: /* We handle this much like a ceded vcpu */ + /* put the HDEC into the DEC, since HDEC interrupts don't wake us */ + mfspr r3, SPRN_HDEC + mtspr SPRN_DEC, r3 /* set our bit in napping_threads */ ld r5, HSTATE_KVM_VCORE(r13) lbz r7, HSTATE_PTID(r13) @@ -223,6 +226,12 @@ kvm_novcpu_wakeup: cmpdi r3, 0 bge kvm_novcpu_exit + /* See if our timeslice has expired (HDEC is negative) */ + mfspr r0, SPRN_HDEC + li r12, BOOK3S_INTERRUPT_HV_DECREMENTER + cmpwi r0, 0 + blt kvm_novcpu_exit + /* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */ ld r4, HSTATE_KVM_VCPU(r13) cmpdi r4, 0 @@ -1493,10 +1502,10 @@ kvmhv_do_exit: /* r12 = trap, r13 = paca */ cmpwi r3,0x100/* Are we the first here? */ bge 43f cmpwi r12,BOOK3S_INTERRUPT_HV_DECREMENTER - beq 40f + beq 43f li r0,0 mtspr SPRN_HDEC,r0 -40: + /* * Send an IPI to any napping threads, since an HDEC interrupt * doesn't wake CPUs up from nap. @@ -2124,6 +2133,27 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */ /* save FP state */ bl kvmppc_save_fp + /* +* Set DEC to the smaller of DEC and HDEC, so that we wake +* no later than the end of our timeslice (HDEC interrupts +* don't wake us from nap). +*/ + mfspr r3, SPRN_DEC + mfspr r4, SPRN_HDEC + mftbr5 + cmpwr3, r4 + ble 67f + mtspr SPRN_DEC, r4 +67: + /* save expiry time of guest decrementer */ + extsw r3, r3 + add r3, r3, r5 + ld r4, HSTATE_KVM_VCPU(r13) + ld r5, HSTATE_KVM_VCORE(r13) + ld r6, VCORE_TB_OFFSET(r5) + subfr3, r6, r3 /* convert to host TB value */ + std r3, VCPU_DEC_EXPIRES(r4) + #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING ld r4, HSTATE_KVM_VCPU(r13) addir3, r4, VCPU_TB_CEDE @@ -2181,6 +2211,15 @@ kvm_end_cede: /* load up FP state */ bl kvmppc_load_fp + /* Restore guest decrementer */ + ld r3, VCPU_DEC_EXPIRES(r4) + ld r5, HSTATE_KVM_VCORE(r13) + ld r6, VCORE_TB_OFFSET(r5) + add r3, r3, r6 /* convert host TB to guest TB value */ + mftbr7 + subfr3, r7, r3 + mtspr SPRN_DEC, r3 + /* Load NV GPRS */ ld r14, VCPU_GPR(R14)(r4) ld r15, VCPU_GPR(R15)(r4) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 18/21] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
From: Paul Mackerras Currently, the entry_exit_count field in the kvmppc_vcore struct contains two 8-bit counts, one of the threads that have started entering the guest, and one of the threads that have started exiting the guest. This changes it to an entry_exit_map field which contains two bitmaps of 8 bits each. The advantage of doing this is that it gives us a bitmap of which threads need to be signalled when exiting the guest. That means that we no longer need to use the trick of setting the HDEC to 0 to pull the other threads out of the guest, which led in some cases to a spurious HDEC interrupt on the next guest entry. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 15 arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kvm/book3s_hv.c| 5 ++- arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++-- 5 files changed, 44 insertions(+), 49 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1517faa..d67a838 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -263,15 +263,15 @@ struct kvm_arch { /* * Struct for a virtual core. - * Note: entry_exit_count combines an entry count in the bottom 8 bits - * and an exit count in the next 8 bits. This is so that we can - * atomically increment the entry count iff the exit count is 0 - * without taking the lock. + * Note: entry_exit_map combines a bitmap of threads that have entered + * in the bottom 8 bits and a bitmap of threads that have exited in the + * next 8 bits. This is so that we can atomically set the entry bit + * iff the exit map is 0 without taking a lock. */ struct kvmppc_vcore { int n_runnable; int num_threads; - int entry_exit_count; + int entry_exit_map; int napping_threads; int first_vcpuid; u16 pcpu; @@ -296,8 +296,9 @@ struct kvmppc_vcore { ulong conferring_threads; }; -#define VCORE_ENTRY_COUNT(vc) ((vc)->entry_exit_count & 0xff) -#define VCORE_EXIT_COUNT(vc) ((vc)->entry_exit_count >> 8) +#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff) +#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8) +#define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) /* Values for vcore_state */ #define VCORE_INACTIVE 0 diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 8aa8246..0d07efb 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -562,7 +562,7 @@ int main(void) DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop)); DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort)); DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1)); - DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count)); + DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map)); DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest)); DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads)); DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7c1335d..ea1600f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1952,7 +1952,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) /* * Initialize *vc. */ - vc->entry_exit_count = 0; + vc->entry_exit_map = 0; vc->preempt_tb = TB_NIL; vc->in_guest = 0; vc->napping_threads = 0; @@ -2119,8 +2119,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * this thread straight away and have it join in. */ if (!signal_pending(current)) { - if (vc->vcore_state == VCORE_RUNNING && - VCORE_EXIT_COUNT(vc) == 0) { + if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) { kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); trace_kvm_guest_enter(vcpu); diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 1954a1c..2754251 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int target, int rv = H_SUCCESS; /* => don't yield */ set_bit(vcpu->arch.ptid, &vc->conferring_threads); - while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) { - threads_running = VCORE_ENTRY_COUNT(vc); - threads_ceded = hweight32(vc->napping
[PULL 03/21] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.
From: Michael Ellerman Some PowerNV systems include a hardware random-number generator. This HWRNG is present on POWER7+ and POWER8 chips and is capable of generating one 64-bit random number every microsecond. The random numbers are produced by sampling a set of 64 unstable high-frequency oscillators and are almost completely entropic. PAPR defines an H_RANDOM hypercall which guests can use to obtain one 64-bit random sample from the HWRNG. This adds a real-mode implementation of the H_RANDOM hypercall. This hypercall was implemented in real mode because the latency of reading the HWRNG is generally small compared to the latency of a guest exit and entry for all the threads in the same virtual core. Userspace can detect the presence of the HWRNG and the H_RANDOM implementation by querying the KVM_CAP_PPC_HWRNG capability. The H_RANDOM hypercall implementation will only be invoked when the guest does an H_RANDOM hypercall if userspace first enables the in-kernel H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability. Signed-off-by: Michael Ellerman Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- Documentation/virtual/kvm/api.txt | 17 + arch/powerpc/include/asm/archrandom.h | 11 ++- arch/powerpc/include/asm/kvm_ppc.h | 2 + arch/powerpc/kvm/book3s_hv_builtin.c| 15 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 arch/powerpc/kvm/powerpc.c | 3 + arch/powerpc/platforms/powernv/rng.c| 29 include/uapi/linux/kvm.h| 1 + 8 files changed, 191 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bc9f6fe..9fa2bf8 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3573,3 +3573,20 @@ struct { @ar - access register number KVM handlers should exit to userspace with rc = -EREMOTE. + + +8. Other capabilities. +-- + +This section lists capabilities that give information about other +features of the KVM implementation. + +8.1 KVM_CAP_PPC_HWRNG + +Architectures: ppc + +This capability, if KVM_CHECK_EXTENSION indicates that it is +available, means that that the kernel has an implementation of the +H_RANDOM hypercall backed by a hardware random-number generator. +If present, the kernel H_RANDOM handler can be enabled for guest use +with the KVM_CAP_PPC_ENABLE_HCALL capability. diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h index bde5311..0cc6eed 100644 --- a/arch/powerpc/include/asm/archrandom.h +++ b/arch/powerpc/include/asm/archrandom.h @@ -30,8 +30,6 @@ static inline int arch_has_random(void) return !!ppc_md.get_random_long; } -int powernv_get_random_long(unsigned long *v); - static inline int arch_get_random_seed_long(unsigned long *v) { return 0; @@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void) #endif /* CONFIG_ARCH_RANDOM */ +#ifdef CONFIG_PPC_POWERNV +int powernv_hwrng_present(void); +int powernv_get_random_long(unsigned long *v); +int powernv_get_random_real_mode(unsigned long *v); +#else +static inline int powernv_hwrng_present(void) { return 0; } +static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; } +#endif + #endif /* _ASM_POWERPC_ARCHRANDOM_H */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 46bf652..b8475da 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm) return kvm->arch.kvm_ops == kvmppc_hv_ops; } +extern int kvmppc_hwrng_present(void); + /* * Cuts out inst bits with ordering according to spec. * That means the leftmost bit is zero. All given bits are included. diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 1f083ff..1954a1c 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -21,6 +21,7 @@ #include #include #include +#include #define KVM_CMA_CHUNK_ORDER18 @@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd) return 0; } EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode); + +int kvmppc_hwrng_present(void) +{ + return powernv_hwrng_present(); +} +EXPORT_SYMBOL_GPL(kvmppc_hwrng_present); + +long kvmppc_h_random(struct kvm_vcpu *vcpu) +{ + if (powernv_get_random_real_mode(&vcpu->arch.gpr[4])) + return H_SUCCESS; + + return H_HARDWARE; +} diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 6cbf163..0814ca1 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1839,6 +1839,121 @@ hcall_real_table: .long 0 /* 0x12c */ .long 0
[PULL 06/21] KVM: PPC: Book3S HV: Add guest->host real mode completion counters
From: "Suresh E. Warrier" Add counters to track number of times we switch from guest real mode to host virtual mode during an interrupt-related hyper call because the hypercall requires actions that cannot be completed in real mode. This will help when making optimizations that reduce guest-host transitions. It is safe to use an ordinary increment rather than an atomic operation because there is one ICP per virtual CPU and kvmppc_xics_rm_complete() only works on the ICP for the current VCPU. The counters are displayed as part of IPC and ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM. Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_xics.c | 31 +++ arch/powerpc/kvm/book3s_xics.h | 6 ++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index a4a8d9f..60bdbac 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n", hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt); - if (icp->rm_action & XICS_RM_KICK_VCPU) + if (icp->rm_action & XICS_RM_KICK_VCPU) { + icp->n_rm_kick_vcpu++; kvmppc_fast_vcpu_kick(icp->rm_kick_target); - if (icp->rm_action & XICS_RM_CHECK_RESEND) + } + if (icp->rm_action & XICS_RM_CHECK_RESEND) { + icp->n_rm_check_resend++; icp_check_resend(xics, icp->rm_resend_icp); - if (icp->rm_action & XICS_RM_REJECT) + } + if (icp->rm_action & XICS_RM_REJECT) { + icp->n_rm_reject++; icp_deliver_irq(xics, icp, icp->rm_reject); - if (icp->rm_action & XICS_RM_NOTIFY_EOI) + } + if (icp->rm_action & XICS_RM_NOTIFY_EOI) { + icp->n_rm_notify_eoi++; kvm_notify_acked_irq(vcpu->kvm, 0, icp->rm_eoied_irq); + } icp->rm_action = 0; @@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void *private) struct kvm *kvm = xics->kvm; struct kvm_vcpu *vcpu; int icsid, i; + unsigned long t_rm_kick_vcpu, t_rm_check_resend; + unsigned long t_rm_reject, t_rm_notify_eoi; if (!kvm) return 0; + t_rm_kick_vcpu = 0; + t_rm_notify_eoi = 0; + t_rm_check_resend = 0; + t_rm_reject = 0; + seq_printf(m, "=\nICP state\n=\n"); kvm_for_each_vcpu(i, vcpu, kvm) { @@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void *private) icp->server_num, state.xisr, state.pending_pri, state.cppr, state.mfrr, state.out_ee, state.need_resend); + t_rm_kick_vcpu += icp->n_rm_kick_vcpu; + t_rm_notify_eoi += icp->n_rm_notify_eoi; + t_rm_check_resend += icp->n_rm_check_resend; + t_rm_reject += icp->n_rm_reject; } + seq_puts(m, "ICP Guest Real Mode exit totals: "); + seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n", + t_rm_kick_vcpu, t_rm_check_resend, + t_rm_reject, t_rm_notify_eoi); for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) { struct kvmppc_ics *ics = xics->ics[icsid]; diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h index 73f0f27..de970ec 100644 --- a/arch/powerpc/kvm/book3s_xics.h +++ b/arch/powerpc/kvm/book3s_xics.h @@ -78,6 +78,12 @@ struct kvmppc_icp { u32 rm_reject; u32 rm_eoied_irq; + /* Counters for each reason we exited real mode */ + unsigned long n_rm_kick_vcpu; + unsigned long n_rm_check_resend; + unsigned long n_rm_reject; + unsigned long n_rm_notify_eoi; + /* Debug stuff for real mode */ union kvmppc_icp_state rm_dbgstate; struct kvm_vcpu *rm_dbgtgt; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 12/21] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
From: Paul Mackerras Previously, if kvmppc_run_core() was running a VCPU that needed a VPA update (i.e. one of its 3 virtual processor areas needed to be pinned in memory so the host real mode code can update it on guest entry and exit), we would drop the vcore lock and do the update there and then. Future changes will make it inconvenient to drop the lock, so instead we now remove it from the list of runnable VCPUs and wake up its VCPU task. This will have the effect that the VCPU task will exit kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call to kvmppc_update_vpas() and then rejoin the vcore. The one complication is that the runner VCPU (whose VCPU task is the current task) might be one of the ones that gets removed from the runnable list. In that case we just return from kvmppc_run_core() and let the code in kvmppc_run_vcpu() wake up another VCPU task to be the runner if necessary. This all means that the VCORE_STARTING state is no longer used, so we remove it. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 5 ++-- arch/powerpc/kvm/book3s_hv.c| 56 - 2 files changed, 32 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d2068bb..2f339ff 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -306,9 +306,8 @@ struct kvmppc_vcore { /* Values for vcore_state */ #define VCORE_INACTIVE 0 #define VCORE_SLEEPING 1 -#define VCORE_STARTING 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_RUNNING 2 +#define VCORE_EXITING 3 /* * Struct used to manage memory for a virtual processor area diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 64a02d4..b38c10e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1863,6 +1863,25 @@ static void kvmppc_start_restoring_l2_cache(const struct kvmppc_vcore *vc) mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE); } +static void prepare_threads(struct kvmppc_vcore *vc) +{ + struct kvm_vcpu *vcpu, *vnext; + + list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, +arch.run_list) { + if (signal_pending(vcpu->arch.run_task)) + vcpu->arch.ret = -EINTR; + else if (vcpu->arch.vpa.update_pending || +vcpu->arch.slb_shadow.update_pending || +vcpu->arch.dtl.update_pending) + vcpu->arch.ret = RESUME_GUEST; + else + continue; + kvmppc_remove_runnable(vc, vcpu); + wake_up(&vcpu->arch.cpu_run); + } +} + /* * Run a set of guest threads on a physical core. * Called with vc->lock held. @@ -1872,46 +1891,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) struct kvm_vcpu *vcpu, *vnext; long ret; u64 now; - int i, need_vpa_update; + int i; int srcu_idx; - struct kvm_vcpu *vcpus_to_update[threads_per_core]; - /* don't start if any threads have a signal pending */ - need_vpa_update = 0; - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { - if (signal_pending(vcpu->arch.run_task)) - return; - if (vcpu->arch.vpa.update_pending || - vcpu->arch.slb_shadow.update_pending || - vcpu->arch.dtl.update_pending) - vcpus_to_update[need_vpa_update++] = vcpu; - } + /* +* Remove from the list any threads that have a signal pending +* or need a VPA update done +*/ + prepare_threads(vc); + + /* if the runner is no longer runnable, let the caller pick a new one */ + if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE) + return; /* -* Initialize *vc, in particular vc->vcore_state, so we can -* drop the vcore lock if necessary. +* Initialize *vc. */ vc->n_woken = 0; vc->nap_count = 0; vc->entry_exit_count = 0; vc->preempt_tb = TB_NIL; - vc->vcore_state = VCORE_STARTING; vc->in_guest = 0; vc->napping_threads = 0; vc->conferring_threads = 0; /* -* Updating any of the vpas requires calling kvmppc_pin_guest_page, -* which can't be called with any spinlocks held. -*/ - if (need_vpa_update) { - spin_unlock(&vc->lock); - for (i = 0; i < need_vpa_update; ++i) - kvmppc_update_vpas(vcpus_to_update[i]); - sp
[PULL 01/21] powerpc: Export __spin_yield
From: "Suresh E. Warrier" Export __spin_yield so that the arch_spin_unlock() function can be invoked from a module. This will be required for modules where we want to take a lock that is also is acquired in hypervisor real mode. Because we want to avoid running any lockdep code (which may not be safe in real mode), this lock needs to be an arch_spinlock_t instead of a normal spinlock. Signed-off-by: Suresh Warrier Acked-by: Paul Mackerras Acked-by: Michael Ellerman Signed-off-by: Alexander Graf --- arch/powerpc/lib/locks.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c index 170a034..f7deebd 100644 --- a/arch/powerpc/lib/locks.c +++ b/arch/powerpc/lib/locks.c @@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock) plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(holder_cpu), yield_count); } +EXPORT_SYMBOL_GPL(__spin_yield); /* * Waiting for a read lock or a write lock on a rwlock... -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 11/21] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
From: Paul Mackerras This reads the timebase at various points in the real-mode guest entry/exit code and uses that to accumulate total, minimum and maximum time spent in those parts of the code. Currently these times are accumulated per vcpu in 5 parts of the code: * rm_entry - time taken from the start of kvmppc_hv_entry() until just before entering the guest. * rm_intr - time from when we take a hypervisor interrupt in the guest until we either re-enter the guest or decide to exit to the host. This includes time spent handling hcalls in real mode. * rm_exit - time from when we decide to exit the guest until the return from kvmppc_hv_entry(). * guest - time spend in the guest * cede - time spent napping in real mode due to an H_CEDE hcall while other threads in the same vcore are active. These times are exposed in debugfs in a directory per vcpu that contains a file called "timings". This file contains one line for each of the 5 timings above, with the name followed by a colon and 4 numbers, which are the count (number of times the code has been executed), the total time, the minimum time, and the maximum time, all in nanoseconds. The overhead of the extra code amounts to about 30ns for an hcall that is handled in real mode (e.g. H_SET_DABR), which is about 25%. Since production environments may not wish to incur this overhead, the new code is conditional on a new config symbol, CONFIG_KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_host.h | 21 + arch/powerpc/include/asm/time.h | 3 + arch/powerpc/kernel/asm-offsets.c | 13 +++ arch/powerpc/kernel/time.c | 6 ++ arch/powerpc/kvm/Kconfig| 14 +++ arch/powerpc/kvm/book3s_hv.c| 150 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 141 +- 7 files changed, 346 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index f1d0bbc..d2068bb 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -369,6 +369,14 @@ struct kvmppc_slb { u8 base_page_size; /* MMU_PAGE_xxx */ }; +/* Struct used to accumulate timing information in HV real mode code */ +struct kvmhv_tb_accumulator { + u64 seqcount; /* used to synchronize access, also count * 2 */ + u64 tb_total; /* total time in timebase ticks */ + u64 tb_min; /* min time */ + u64 tb_max; /* max time */ +}; + # ifdef CONFIG_PPC_FSL_BOOK3E #define KVMPPC_BOOKE_IAC_NUM 2 #define KVMPPC_BOOKE_DAC_NUM 2 @@ -657,6 +665,19 @@ struct kvm_vcpu_arch { u32 emul_inst; #endif + +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING + struct kvmhv_tb_accumulator *cur_activity; /* What we're timing */ + u64 cur_tb_start; /* when it started */ + struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ + struct kvmhv_tb_accumulator guest_time; /* guest execution */ + struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ + + struct dentry *debugfs_dir; + struct dentry *debugfs_timings; +#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */ }; #define VCPU_FPR(vcpu, i) (vcpu)->arch.fp.fpr[i][TS_FPROFFSET] diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 03cbada..10fc784 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void); DECLARE_PER_CPU(u64, decrementers_next_tb); +/* Convert timebase ticks to nanoseconds */ +unsigned long long tb_to_ns(unsigned long long tb_ticks); + #endif /* __KERNEL__ */ #endif /* __POWERPC_TIME_H */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 4717859..3fea721 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -459,6 +459,19 @@ int main(void) DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2)); DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3)); #endif +#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry)); + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr)); + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit)); + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time)); + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time)); + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity)); + DEFINE(VCPU_ACTIVITY_START, offset
[PULL 21/21] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
From: Paul Mackerras This uses msgsnd where possible for signalling other threads within the same core on POWER8 systems, rather than IPIs through the XICS interrupt controller. This includes waking secondary threads to run the guest, the interrupts generated by the virtual XICS, and the interrupts to bring the other threads out of the guest when exiting. Aggregated statistics from debugfs across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, show this before the change: rm_entry: 3387.6ns (228 - 86600, 1008969 samples) rm_exit: 4561.5ns (12 - 3477452, 1009402 samples) rm_intr: 1660.0ns (12 - 553050, 3600051 samples) and this after the change: rm_entry: 3060.1ns (212 - 65138, 953873 samples) rm_exit: 4244.1ns (12 - 9693408, 954331 samples) rm_intr: 1342.3ns (12 - 1104718, 3405326 samples) for a test of booting Fedora 20 big-endian to the login prompt. The time taken for a H_PROD hcall (which is handled in the host kernel) went down from about 35 microseconds to about 16 microseconds with this change. The noinline added to kvmppc_run_core turned out to be necessary for good performance, at least with gcc 4.9.2 as packaged with Fedora 21 and a little-endian POWER8 host. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kernel/asm-offsets.c | 3 ++ arch/powerpc/kvm/book3s_hv.c| 51 ++--- arch/powerpc/kvm/book3s_hv_builtin.c| 16 +-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 -- 4 files changed, 70 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 0d07efb..0034b6b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef CONFIG_PPC64 #include #include @@ -759,5 +760,7 @@ int main(void) offsetof(struct paca_struct, subcore_sibling_mask)); #endif + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER); + return 0; } diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ea1600f..48d3c5d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include #include @@ -84,9 +85,35 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); +static bool kvmppc_ipi_thread(int cpu) +{ + /* On POWER8 for IPIs to threads in the same core, use msgsnd */ + if (cpu_has_feature(CPU_FTR_ARCH_207S)) { + preempt_disable(); + if (cpu_first_thread_sibling(cpu) == + cpu_first_thread_sibling(smp_processor_id())) { + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); + msg |= cpu_thread_in_core(cpu); + smp_mb(); + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg)); + preempt_enable(); + return true; + } + preempt_enable(); + } + +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) + if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) { + xics_wake_cpu(cpu); + return true; + } +#endif + + return false; +} + static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int me; int cpu = vcpu->cpu; wait_queue_head_t *wqp; @@ -96,20 +123,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) ++vcpu->stat.halt_wakeup; } - me = get_cpu(); + if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid)) + return; /* CPU points to the first thread of the core */ - if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) { -#ifdef CONFIG_PPC_ICP_NATIVE - int real_cpu = cpu + vcpu->arch.ptid; - if (paca[real_cpu].kvm_hstate.xics_phys) - xics_wake_cpu(real_cpu); - else -#endif - if (cpu_online(cpu)) - smp_send_reschedule(cpu); - } - put_cpu(); + if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu)) + smp_send_reschedule(cpu); } /* @@ -1781,10 +1800,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */ smp_wmb(); tpaca->kvm_hstate.kvm_vcpu = vcpu; -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) if (cpu != smp_processor_id()) - xics_wake_cpu(cpu); -#endif + kvmppc_ipi_thread(cp
[PULL 10/21] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
From: Paul Mackerras This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vm, where is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 000bd7ff1196 0003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 2 + arch/powerpc/include/asm/kvm_host.h | 2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 136 +++ arch/powerpc/kvm/book3s_hv.c | 12 +++ virt/kvm/kvm_main.c | 1 + 5 files changed, 153 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 0789a0f..869c53f 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm) return rcu_dereference_raw_notrace(kvm->memslots); } +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); + #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 015773f..f1d0bbc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -238,6 +238,8 @@ struct kvm_arch { atomic_t hpte_mod_interest; cpumask_t need_tlb_flush; int hpt_cma_alloc; + struct dentry *debugfs_dir; + struct dentry *htab_dentry; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE struct mutex hpt_mutex; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 6c6825a..d6fe308 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *ghf) return ret; } +struct debugfs_htab_state { + struct kvm *kvm; + struct mutexmutex; + unsigned long hpt_index; + int chars_left; + int buf_index; + charbuf[64]; +}; + +static int debugfs_htab_open(struct inode *inode, struct file *file) +{ + struct kvm *kvm = inode->i_private; + struct debugfs_htab_state *p; + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) + return -ENOMEM; + + kvm_get_kvm(kvm); + p->kvm = kvm; + mutex_init(&p->mutex); + file->private_data = p; + + return nonseekable_open(inode, file); +} + +static int debugfs_htab_release(struct inode *inode, struct file *file) +{ + struct debugfs_htab_state *p = file->private_data; + + kvm_put_kvm(p->kvm); + kfree(p); + return 0; +} + +static ssize_t debugfs_htab_read(struct file *file, char __user *buf, +size_t len, loff_t *ppos) +{ + struct debugfs_htab_state *p = file->private_data; + ssize_t ret, r; + unsigned long i, n; + unsigned long v, hr, gr; + struct kvm *kvm; + __be64 *hptp; + + ret = mutex_lock_interruptible(&p->mutex); + if (ret) + return ret; + + if (p->chars_left) { + n = p->chars_left; + if (n > len) + n = len; + r = copy_to_user(buf, p->buf + p->buf_index, n); + n -= r; + p->chars_left -= n; + p->buf_index += n; + buf += n; + len -= n; + ret = n; + if (r) { + if (!n) + ret = -EFAULT; + goto out; + } + } + + kvm = p->kvm; + i = p->hpt_index; + hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE)); + for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) { + if (!(be64_t
[PULL 05/21] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
From: "Aneesh Kumar K.V" This adds helper routines for locking and unlocking HPTEs, and uses them in the rest of the code. We don't change any locking rules in this patch. Signed-off-by: Aneesh Kumar K.V Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 3 files changed, 33 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e20..0789a0f 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v &= ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER "" : : : "memory"); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v &= ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index dbf1271..6c6825a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK; gr = kvm->arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile("lwsync" : : : "memory"); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte->eaddr = eaddr; @@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev->guest_rpte; - asm volatile("lwsync" : : : "memory"); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu->arch.pgfault_hpte[0] || @@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile("ptesync" : : : "memory"); preempt_enable(); if (page && hpte_is_writable(r)) @@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v &= ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r &= ~HPTE_GR_MODIFIED; revp->guest_rpte = r; } -
[PULL 08/21] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
From: Suresh Warrier Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs to switch to the host to complete the rest of hypercall function in virtual mode. This patch ports the virtual mode ICS/ICP reject and resend functions to be runnable in hypervisor real mode, thus avoiding the need to switch to the host to execute these functions in virtual mode. However, the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify events - these events cannot be done in real mode and they will still need a switch to host virtual mode. There are sufficient differences between the real mode code and the virtual mode code for the ICS/ICP resend and reject functions that for now the code has been duplicated instead of sharing common code. In the future, we can look at creating common functions. Signed-off-by: Suresh Warrier Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 --- 1 file changed, 211 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 7c22997..73bbe92 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -23,12 +23,39 @@ #define DEBUG_PASSUP +static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, + u32 new_irq); + static inline void rm_writeb(unsigned long paddr, u8 val) { __asm__ __volatile__("sync; stbcix %0,0,%1" : : "r" (val), "r" (paddr) : "memory"); } +/* -- ICS routines -- */ +static void ics_rm_check_resend(struct kvmppc_xics *xics, + struct kvmppc_ics *ics, struct kvmppc_icp *icp) +{ + int i; + + arch_spin_lock(&ics->lock); + + for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { + struct ics_irq_state *state = &ics->irq_state[i]; + + if (!state->resend) + continue; + + arch_spin_unlock(&ics->lock); + icp_rm_deliver_irq(xics, icp, state->number); + arch_spin_lock(&ics->lock); + } + + arch_spin_unlock(&ics->lock); +} + +/* -- ICP routines -- */ + static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, struct kvm_vcpu *this_vcpu) { @@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics, return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS; } +static void icp_rm_check_resend(struct kvmppc_xics *xics, +struct kvmppc_icp *icp) +{ + u32 icsid; + + /* Order this load with the test for need_resend in the caller */ + smp_rmb(); + for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) { + struct kvmppc_ics *ics = xics->ics[icsid]; + + if (!test_and_clear_bit(icsid, icp->resend_map)) + continue; + if (!ics) + continue; + ics_rm_check_resend(xics, ics, icp); + } +} + +static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority, + u32 *reject) +{ + union kvmppc_icp_state old_state, new_state; + bool success; + + do { + old_state = new_state = READ_ONCE(icp->state); + + *reject = 0; + + /* See if we can deliver */ + success = new_state.cppr > priority && + new_state.mfrr > priority && + new_state.pending_pri > priority; + + /* +* If we can, check for a rejection and perform the +* delivery +*/ + if (success) { + *reject = new_state.xisr; + new_state.xisr = irq; + new_state.pending_pri = priority; + } else { + /* +* If we failed to deliver we set need_resend +* so a subsequent CPPR state change causes us +* to try a new delivery. +*/ + new_state.need_resend = true; + } + + } while (!icp_rm_try_update(icp, old_state, new_state)); + + return success; +} + +static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, + u32 new_irq) +{ + struct ics_irq_state *state; + struct kvmppc_ics *ics; + u32 reject; + u16 src; + + /* +* This is used both for initial delivery of an interrupt and +* for subsequent rejection. +* +* Rejection can be racy vs. resends. We have evaluated the +
Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
On 04/21/2015 02:41 AM, David Gibson wrote: On POWER, storage caching is usually configured via the MMU - attributes such as cache-inhibited are stored in the TLB and the hashed page table. This makes correctly performing cache inhibited IO accesses awkward when the MMU is turned off (real mode). Some CPU models provide special registers to control the cache attributes of real mode load and stores but this is not at all consistent. This is a problem in particular for SLOF, the firmware used on KVM guests, which runs entirely in real mode, but which needs to do IO to load the kernel. To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to a logical address (aka guest physical address). SLOF uses these for IO. However, because these are implemented within qemu, not the host kernel, these bypass any IO devices emulated within KVM itself. The simplest way to see this problem is to attempt to boot a KVM guest from a virtio-blk device with iothread / dataplane enabled. The iothread code relies on an in kernel implementation of the virtio queue notification, which is not triggered by the IO hcalls, and so the guest will stall in SLOF unable to load the guest OS. This patch addresses this by providing in-kernel implementations of the 2 hypercalls, which correctly scan the KVM IO bus. Any access to an address not handled by the KVM IO bus will cause a VM exit, hitting the qemu implementation as before. Note that a userspace change is also required, in order to enable these new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. Signed-off-by: David Gibson --- arch/powerpc/include/asm/kvm_book3s.h | 3 ++ arch/powerpc/kvm/book3s.c | 76 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_pr_papr.c | 28 + 4 files changed, 119 insertions(+) Changes in v4: * Rebase onto 4.0+, correct for changed signature of kvm_io_bus_{read,write} Alex, I saw from some build system notifications that you seemed to hit some troubles compiling the last version of this patch. This should fix it - hope it's not too late to get into 4.1. Oh, I already fixed it up in my tree, no worries. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/12] Remaining improvements for HV KVM
On 09.04.15 10:49, Paolo Bonzini wrote: > > > On 09/04/2015 00:57, Alexander Graf wrote: >>> >>> The last patch in this series needs a definition of PPC_MSGCLR that is >>> added by the patch "powerpc/powernv: Fixes for hypervisor doorbell >>> handling", which has now gone upstream into Linus' tree as commit >>> 755563bc79c7 via the linuxppc-dev mailing list. Alex, how do you want >>> to handle that? You could pull in the master branch of the kvm tree, >>> which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and >>> let the subsequent merge fix it up. >> >> I've just cherry-picked it for now since it still lives in my queue, so >> it will get thrown out automatically once I rebase on next if it's >> included in there. >> >> Paolo / Marcelo, could you please try to somehow get the commit above >> into the next branch somehow? I guess the easiest would be to merge >> linus/master into kvm/next. >> >> Thanks, applied all to kvm-ppc-queue. > > I plan to send the x86/MIPS/s390/ARM merge very early to Linus, maybe > even tomorrow. So you can just rebase on top of 4.0-rc6 and send your > pull request relative to Linus's tree instead of kvm/next. > > Does that work for you? Phew, that really complicates things on my side. I usually do kvm-ppc-queue -> kvm-ppc-next -> kvm/next which means that my queue already contains your next patches. I could of course to a rebase --onto and remove anything that is in the kvm tree, but then we'd end up conflicting on documentation changes. Since you already did send out the first pull request, just let me know when you pulled linus' tree back into kvm/next (or kvm/master) so that I can fast-forward merge this in my kvm-ppc-next branch and then rebase my queue on top, merge it into the next branch and send you a pull request ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/12] Remaining improvements for HV KVM
On 14.04.15 13:56, Paul Mackerras wrote: > On Thu, Apr 09, 2015 at 12:57:58AM +0200, Alexander Graf wrote: >> On 03/28/2015 04:21 AM, Paul Mackerras wrote: >>> This is the rest of my current patch queue for HV KVM on PPC. This >>> series is based on Alex Graf's kvm-ppc-queue branch. The only change >> >from the previous version of this series is that patch 2 has been >>> updated to take account of the timebase offset. >>> >>> The last patch in this series needs a definition of PPC_MSGCLR that is >>> added by the patch "powerpc/powernv: Fixes for hypervisor doorbell >>> handling", which has now gone upstream into Linus' tree as commit >>> 755563bc79c7 via the linuxppc-dev mailing list. Alex, how do you want >>> to handle that? You could pull in the master branch of the kvm tree, >>> which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and >>> let the subsequent merge fix it up. >> >> I've just cherry-picked it for now since it still lives in my queue, so it >> will get thrown out automatically once I rebase on next if it's included in >> there. >> >> Paolo / Marcelo, could you please try to somehow get the commit above into >> the next branch somehow? I guess the easiest would be to merge linus/master >> into kvm/next. >> >> Thanks, applied all to kvm-ppc-queue. > > Did you forget to push it out or something? Your kvm-ppc-queue branch > is still at 4.0-rc1 as far as I can see. Oops, not sure how that happened. Does it show up correctly for you now? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/12] Remaining improvements for HV KVM
On 03/28/2015 04:21 AM, Paul Mackerras wrote: This is the rest of my current patch queue for HV KVM on PPC. This series is based on Alex Graf's kvm-ppc-queue branch. The only change from the previous version of this series is that patch 2 has been updated to take account of the timebase offset. The last patch in this series needs a definition of PPC_MSGCLR that is added by the patch "powerpc/powernv: Fixes for hypervisor doorbell handling", which has now gone upstream into Linus' tree as commit 755563bc79c7 via the linuxppc-dev mailing list. Alex, how do you want to handle that? You could pull in the master branch of the kvm tree, which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and let the subsequent merge fix it up. I've just cherry-picked it for now since it still lives in my queue, so it will get thrown out automatically once I rebase on next if it's included in there. Paolo / Marcelo, could you please try to somehow get the commit above into the next branch somehow? I guess the easiest would be to merge linus/master into kvm/next. Thanks, applied all to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 3/3] KVM: PPC: Book3S HV: Fix instruction emulation
From: Paul Mackerras Commit 4a157d61b48c ("KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register") had the side effect that we no longer reset vcpu->arch.last_inst to -1 on guest exit in the cases where the instruction is not fetched from the guest. This means that if instruction emulation turns out to be required in those cases, the host will emulate the wrong instruction, since vcpu->arch.last_inst will contain the last instruction that was emulated. This fixes it by making sure that vcpu->arch.last_inst is reset to -1 in those cases. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bb94e6f..6cbf163 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Save HEIR (HV emulation assist reg) in emul_inst if this is an HEI (HV emulation interrupt, e40) */ li r3,KVM_INST_FETCH_FAILED + stw r3,VCPU_LAST_INST(r9) cmpwi r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST bne 11f mfspr r3,SPRN_HEIR -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 1/3] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()
From: Paul Mackerras Currently, kvmppc_set_lpcr() has a spinlock around the whole function, and inside that does mutex_lock(&kvm->lock). It is not permitted to take a mutex while holding a spinlock, because the mutex_lock might call schedule(). In addition, this causes lockdep to warn about a lock ordering issue: == [ INFO: possible circular locking dependency detected ] 3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted --- qemu-system-ppc/8179 is trying to acquire lock: (&kvm->lock){+.+.+.}, at: [] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv] but task is already holding lock: (&(&vcore->lock)->rlock){+.+...}, at: [] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&vcore->lock)->rlock){+.+...}: [] .mutex_lock_nested+0x80/0x570 [] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv] [] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm] [] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm] [] .do_vfs_ioctl+0x444/0x770 [] .SyS_ioctl+0xc4/0xe0 [] syscall_exit+0x0/0x98 -> #0 (&kvm->lock){+.+.+.}: [] .lock_acquire+0xcc/0x1a0 [] .mutex_lock_nested+0x80/0x570 [] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv] [] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv] [] .kvmppc_set_one_reg+0x44/0x330 [kvm] [] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm] [] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm] [] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm] [] .do_vfs_ioctl+0x444/0x770 [] .SyS_ioctl+0xc4/0xe0 [] syscall_exit+0x0/0x98 other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(&(&vcore->lock)->rlock); lock(&kvm->lock); lock(&(&vcore->lock)->rlock); lock(&kvm->lock); *** DEADLOCK *** 2 locks held by qemu-system-ppc/8179: #0: (&vcpu->mutex){+.+.+.}, at: [] .vcpu_load+0x28/0x90 [kvm] #1: (&(&vcore->lock)->rlock){+.+...}, at: [] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv] stack backtrace: CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131 Call Trace: [c01a66c0f310] [c0b486ac] .dump_stack+0x88/0xb4 (unreliable) [c01a66c0f390] [c00f8bec] .print_circular_bug+0x27c/0x3d0 [c01a66c0f440] [c00fe9e8] .__lock_acquire+0x2028/0x2190 [c01a66c0f5d0] [c00ff28c] .lock_acquire+0xcc/0x1a0 [c01a66c0f6a0] [c0b3c120] .mutex_lock_nested+0x80/0x570 [c01a66c0f7c0] [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv] [c01a66c0f860] [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv] [c01a66c0f8d0] [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm] [c01a66c0f960] [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm] [c01a66c0f9f0] [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm] [c01a66c0faf0] [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm] [c01a66c0fcb0] [c026cbb4] .do_vfs_ioctl+0x444/0x770 [c01a66c0fd90] [c026cfa4] .SyS_ioctl+0xc4/0xe0 [c01a66c0fe30] [c0009264] syscall_exit+0x0/0x98 This fixes it by moving the mutex_lock()/mutex_unlock() pair outside the spin-locked region. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index de4018a..b273193 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu, static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr, bool preserve_top32) { + struct kvm *kvm = vcpu->kvm; struct kvmppc_vcore *vc = vcpu->arch.vcore; u64 mask; + mutex_lock(&kvm->lock); spin_lock(&vc->lock); /* * If ILE (interrupt little-endian) has changed, update the * MSR_LE bit in the intr_msr for each vcpu in this vcore. */ if ((new_lpcr & LPCR_ILE) != (vc->lpcr & LPCR_ILE)) { - struct kvm *kvm = vcpu->kvm; struct kvm_vcpu *vcpu; int i; - mutex_lock(&kvm->lock); kvm_for_each_vcpu(i, vcpu, kvm) { if (vcpu->arch.vcore != vc) continue; @@ -964,7 +964,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr, else vcpu->arch.in
[PULL 2/3] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
From: Paul Mackerras The VPA (virtual processor area) is defined by PAPR and is therefore big-endian, so we need a be32_to_cpu when reading it in kvmppc_get_yield_count(). Without this, H_CONFER always fails on a little-endian host, causing SMP guests to waste time spinning on spinlocks. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b273193..de74756 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu) spin_lock(&vcpu->arch.vpa_update_lock); lppaca = (struct lppaca *)vcpu->arch.vpa.pinned_addr; if (lppaca) - yield_count = lppaca->yield_count; + yield_count = be32_to_cpu(lppaca->yield_count); spin_unlock(&vcpu->arch.vpa_update_lock); return yield_count; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 0/3] 4.0 patch queue 2015-03-25
Hi Paolo, This is my current patch queue for 4.0. Please pull. Alex The following changes since commit f710a12d73dfa1c3a5d2417f2482b970f03bb850: Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm (2015-03-16 20:08:56 -0300) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-for-4.0 for you to fetch changes up to 2bf27601c7b50b6ced72f27304109dc52eb52919: KVM: PPC: Book3S HV: Fix instruction emulation (2015-03-20 11:42:33 +0100) Patch queue for 4.0 - 2015-03-25 A few bug fixes for Book3S HV KVM: - Fix spinlock ordering - Fix idle guests on LE hosts - Fix instruction emulation Paul Mackerras (3): KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr() KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count KVM: PPC: Book3S HV: Fix instruction emulation arch/powerpc/kvm/book3s_hv.c| 8 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 2 files changed, 5 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-ppc:kvm-ppc-queue 7/9] ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined!
On 23.03.15 04:03, Michael Ellerman wrote: > On Mon, 2015-03-23 at 14:00 +1100, Paul Mackerras wrote: >> On Fri, Mar 20, 2015 at 08:07:53PM +0800, kbuild test robot wrote: >>> tree: git://github.com/agraf/linux-2.6.git kvm-ppc-queue >>> head: 9b1daf3cfba1801768aa41b1b6ad0b653844241f >>> commit: aba777f5ce0accb4c6a277e671de0330752954e8 [7/9] KVM: PPC: Book3S HV: >>> Convert ICS mutex lock to spin lock >>> config: powerpc-defconfig (attached as .config) >>> reproduce: >>> wget >>> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross >>> -O ~/bin/make.cross >>> chmod +x ~/bin/make.cross >>> git checkout aba777f5ce0accb4c6a277e671de0330752954e8 >>> # save the attached .config to linux build tree >>> make.cross ARCH=powerpc >>> >>> All error/warnings: >>> > ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined! >> >> Yes, this is the patch that depends on the "powerpc: Export >> __spin_yield" patch that Suresh posted to linuxppc-...@ozlabs.org and >> I acked. >> >> I think the best thing at this stage is probably for Alex to take that >> patch through his tree, assuming Michael is OK with that. > > Fine by me. > > Acked-by: Michael Ellerman Awesome, thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 23.03.15 08:50, Bharata B Rao wrote: > On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf wrote: >> >> >> On 20.03.15 16:51, Bharata B Rao wrote: >>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 20.03.15 12:26, Paul Mackerras wrote: >>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote: >>>>>> >>>>>> >>>>>> On 20.03.15 10:39, Paul Mackerras wrote: >>>>>>> From: Bharata B Rao >>>>>>> >>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from >>>>>>> userspace(QEMU) >>>>>>> correctly, certain work arounds have to be employed to allow reuse of >>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu >>>>>>> unplug >>>>>>> and reuse it later during next hotplug. >>>>>>> >>>>>>> More details can be found here: >>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>>>>>> >>>>>>> In order to support this workaround with PowerPC KVM, don't create or >>>>>>> initialize ICP if the vCPU is found to be already associated with an >>>>>>> ICP. >>>>>>> >>>>>>> Signed-off-by: Bharata B Rao >>>>>>> Signed-off-by: Paul Mackerras >>>>>> >>>>>> This probably makes some sense, but please make sure that user space has >>>>>> some way to figure out whether hotplug works at all. >>>>> >>>>> Bharata is working on the qemu side of all this, so I assume he has >>>>> that covered. >>>> >>>> Well, so far the kernel doesn't expose anything he can query, so I >>>> suppose he just blindly assumes that older host kernels will randomly >>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu >>>> can check on. >>> >>> I see that you have already taken this into your tree. I have an updated >>> patch to expose a CAP. If the below patch looks ok, then let me know how >>> you would prefer to take this patch in. >>> >>> Regards, >>> Bharata. >>> >>> KVM: PPC: BOOK3S: Allow reuse of vCPU object >>> >>> From: Bharata B Rao >>> >>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) >>> correctly, certain work arounds have to be employed to allow reuse of >>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>> proposed workaround is to park the vcpu fd in userspace during cpu unplug >>> and reuse it later during next hotplug. >>> >>> More details can be found here: >>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>> >>> In order to support this workaround with PowerPC KVM, don't create or >>> initialize ICP if the vCPU is found to be already associated with an ICP. >>> User space (QEMU) can reuse the vCPU after checking for the availability >>> of KVM_CAP_SPAPR_REUSE_VCPU capability. >>> >>> Signed-off-by: Bharata B Rao >>> --- >>> arch/powerpc/kvm/book3s_xics.c |9 +++-- >>> arch/powerpc/kvm/powerpc.c | 12 >>> include/uapi/linux/kvm.h |1 + >>> 3 files changed, 20 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c >>> index a4a8d9f..ead3a35 100644 >>> --- a/arch/powerpc/kvm/book3s_xics.c >>> +++ b/arch/powerpc/kvm/book3s_xics.c >>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, >>> struct kvm_vcpu *vcpu, >>> return -EPERM; >>> if (xics->kvm != vcpu->kvm) >>> return -EPERM; >>> - if (vcpu->arch.irq_type) >>> - return -EBUSY; >>> + >>> + /* >>> + * If irq_type is already set, don't reinialize but >>> + * return success allowing this vcpu to
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 20.03.15 16:51, Bharata B Rao wrote: > On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote: >> >> >> On 20.03.15 12:26, Paul Mackerras wrote: >>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 20.03.15 10:39, Paul Mackerras wrote: >>>>> From: Bharata B Rao >>>>> >>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) >>>>> correctly, certain work arounds have to be employed to allow reuse of >>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug >>>>> and reuse it later during next hotplug. >>>>> >>>>> More details can be found here: >>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>>>> >>>>> In order to support this workaround with PowerPC KVM, don't create or >>>>> initialize ICP if the vCPU is found to be already associated with an ICP. >>>>> >>>>> Signed-off-by: Bharata B Rao >>>>> Signed-off-by: Paul Mackerras >>>> >>>> This probably makes some sense, but please make sure that user space has >>>> some way to figure out whether hotplug works at all. >>> >>> Bharata is working on the qemu side of all this, so I assume he has >>> that covered. >> >> Well, so far the kernel doesn't expose anything he can query, so I >> suppose he just blindly assumes that older host kernels will randomly >> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu >> can check on. > > I see that you have already taken this into your tree. I have an updated > patch to expose a CAP. If the below patch looks ok, then let me know how > you would prefer to take this patch in. > > Regards, > Bharata. > > KVM: PPC: BOOK3S: Allow reuse of vCPU object > > From: Bharata B Rao > > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) > correctly, certain work arounds have to be employed to allow reuse of > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such > proposed workaround is to park the vcpu fd in userspace during cpu unplug > and reuse it later during next hotplug. > > More details can be found here: > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html > > In order to support this workaround with PowerPC KVM, don't create or > initialize ICP if the vCPU is found to be already associated with an ICP. > User space (QEMU) can reuse the vCPU after checking for the availability > of KVM_CAP_SPAPR_REUSE_VCPU capability. > > Signed-off-by: Bharata B Rao > --- > arch/powerpc/kvm/book3s_xics.c |9 +++-- > arch/powerpc/kvm/powerpc.c | 12 > include/uapi/linux/kvm.h |1 + > 3 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c > index a4a8d9f..ead3a35 100644 > --- a/arch/powerpc/kvm/book3s_xics.c > +++ b/arch/powerpc/kvm/book3s_xics.c > @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, > struct kvm_vcpu *vcpu, > return -EPERM; > if (xics->kvm != vcpu->kvm) > return -EPERM; > - if (vcpu->arch.irq_type) > - return -EBUSY; > + > + /* > + * If irq_type is already set, don't reinialize but > + * return success allowing this vcpu to be reused. > + */ > + if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT) > + return 0; > > r = kvmppc_xics_create_icp(vcpu, xcpu); > if (!r) > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 27c0fac..5b7007c 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long > ext) > r = 1; > break; > #endif > + case KVM_CAP_SPAPR_REUSE_VCPU: > + /* > + * Kernel currently doesn't support closing of vCPU fd from > + * user space (QEMU) correctly. Hence the option available > + * is to park the vCPU fd in user space whenever a guest > + * CPU is hot removed and reuse the
Re: [PATCH v4 2/4] kvm/ppc/mpic: drop unused IRQ_testbit
On 21.03.15 07:56, Arseny Solokha wrote: > Drop unused static procedure which doesn't have callers within its > translation unit. It had been already removed independently in QEMU[1] > from the OpenPIC implementation borrowed by the kernel. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html > > v4: Fixed the comment regarding the origination of OpenPIC codebase > and CC'ed KVM mailing lists, as suggested by Alexander Graf. > > v3: In patch 4/4, do not remove fsl_mpic_primary_get_version() from > arch/powerpc/sysdev/mpic.c because the patch by Jia Hongtao > ("powerpc/85xx: workaround for chips with MSI hardware errata") makes > use of it. > > v2: Added a brief explanation to each patch description of why removed > functions are unused, as suggested by Michael Ellerman. > > Signed-off-by: Arseny Solokha Thanks, applied to kvm-ppc-queue (for 4.1). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
On 20.03.15 10:39, Paul Mackerras wrote: > This is my current patch queue for HV KVM on PPC. This series is > based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3 > plus a set of recent KVM changes which don't intersect with the > changes in this series. On top of that, in my testing I have some > patches which are not KVM-related but are needed to boot and run a > recent upstream kernel successfully: > > tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop > tick/hotplug: Handover time related duties before cpu offline > powerpc/powernv: Check image loaded or not before calling flash > powerpc/powernv: Fixes for hypervisor doorbell handling > powerpc/powernv: Fix return value from power7_nap() et al. > powerpc: Export __spin_yield > > These patches have been posted by their authors and are on their way > upstream via various trees. They are not included in this series. > > The first three patches are bug fixes that should go into v4.0 if > possible. The remainder are intended for the 4.1 merge window. > > The patch "powerpc: Export __spin_yield" is a prerequisite for patch > 9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to > spin lock"). It is on its way upstream through the linuxppc-dev > mailing list. > > The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is > needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV: > Use msgsnd for signalling threads". It is also on its way upstream > through the linuxppc-dev list. I am expecting both of these > prerequisite patches to go into 4.0. > > Finally, the last patch in this series converts some of the assembly > code in book3s_hv_rmhandlers.S into C. I intend to continue this > trend. Thanks, applied patches 4-11 to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
On 20.03.15 12:25, Paul Mackerras wrote: > On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote: >> >> >> On 20.03.15 10:39, Paul Mackerras wrote: >>> This reads the timebase at various points in the real-mode guest >>> entry/exit code and uses that to accumulate total, minimum and >>> maximum time spent in those parts of the code. Currently these >>> times are accumulated per vcpu in 5 parts of the code: >>> >>> * rm_entry - time taken from the start of kvmppc_hv_entry() until >>> just before entering the guest. >>> * rm_intr - time from when we take a hypervisor interrupt in the >>> guest until we either re-enter the guest or decide to exit to the >>> host. This includes time spent handling hcalls in real mode. >>> * rm_exit - time from when we decide to exit the guest until the >>> return from kvmppc_hv_entry(). >>> * guest - time spend in the guest >>> * cede - time spent napping in real mode due to an H_CEDE hcall >>> while other threads in the same vcore are active. >>> >>> These times are exposed in debugfs in a directory per vcpu that >>> contains a file called "timings". This file contains one line for >>> each of the 5 timings above, with the name followed by a colon and >>> 4 numbers, which are the count (number of times the code has been >>> executed), the total time, the minimum time, and the maximum time, >>> all in nanoseconds. >>> >>> Signed-off-by: Paul Mackerras >> >> Have you measure the additional overhead this brings? > > I haven't - in fact I did this patch so I could measure the overhead > or improvement from other changes I did, but it doesn't measure its > own overhead, of course. I guess I need a workload that does a > defined number of guest entries and exits and measure how fast it runs > with and without the patch (maybe something like H_SET_MODE in a > loop). I'll figure something out and post the results. Yeah, just measure the number of exits you can handle for a simple hcall. If there is measurable overhead, it's probably a good idea to move the statistics gathering into #ifdef paths for DEBUGFS or maybe even a separate EXIT_TIMING config option as we have it for booke. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 20.03.15 12:26, Paul Mackerras wrote: > On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote: >> >> >> On 20.03.15 10:39, Paul Mackerras wrote: >>> From: Bharata B Rao >>> >>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) >>> correctly, certain work arounds have to be employed to allow reuse of >>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such >>> proposed workaround is to park the vcpu fd in userspace during cpu unplug >>> and reuse it later during next hotplug. >>> >>> More details can be found here: >>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html >>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html >>> >>> In order to support this workaround with PowerPC KVM, don't create or >>> initialize ICP if the vCPU is found to be already associated with an ICP. >>> >>> Signed-off-by: Bharata B Rao >>> Signed-off-by: Paul Mackerras >> >> This probably makes some sense, but please make sure that user space has >> some way to figure out whether hotplug works at all. > > Bharata is working on the qemu side of all this, so I assume he has > that covered. Well, so far the kernel doesn't expose anything he can query, so I suppose he just blindly assumes that older host kernels will randomly break and nobody cares. I'd rather prefer to see a CAP exposed that qemu can check on. > >> Also Paul, for patches that you pick up from others, I'd prefer if they >> send the patches to the ML themselves first and you pick them up from >> there then. That way we give everyone the same treatment. > > Fair enough. In fact Bharata did post the patch but he sent it to > linuxppc-...@ozlabs.org not the KVM lists. Please make sure you only take patches into your queue that made it to at least kvm@vger, preferably kvm-ppc@vger as well. If you see related patches on other mailing lists, just ask the respective people to resend with proper ML exposure. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
On 20.03.15 10:39, Paul Mackerras wrote: > This uses msgsnd where possible for signalling other threads within > the same core on POWER8 systems, rather than IPIs through the XICS > interrupt controller. This includes waking secondary threads to run > the guest, the interrupts generated by the virtual XICS, and the > interrupts to bring the other threads out of the guest when exiting. > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/kernel/asm-offsets.c | 4 +++ > arch/powerpc/kvm/book3s_hv.c| 48 > ++--- > arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 > 4 files changed, 83 insertions(+), 21 deletions(-) > > diff --git a/arch/powerpc/kernel/asm-offsets.c > b/arch/powerpc/kernel/asm-offsets.c > index fa7b57d..0ce2aa6 100644 > --- a/arch/powerpc/kernel/asm-offsets.c > +++ b/arch/powerpc/kernel/asm-offsets.c > @@ -37,6 +37,7 @@ > #include > #include > #include > +#include > #ifdef CONFIG_PPC64 > #include > #include > @@ -568,6 +569,7 @@ int main(void) > DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr)); > DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr)); > DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes)); > + DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu)); > DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige)); > DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv)); > DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb)); > @@ -757,5 +759,7 @@ int main(void) > offsetof(struct paca_struct, subcore_sibling_mask)); > #endif > > + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER); > + > return 0; > } > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 03a8bb4..2c34bae 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -51,6 +51,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, > MAX_HCALL_OPCODE/4 + 1); > static void kvmppc_end_cede(struct kvm_vcpu *vcpu); > static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); > > +static bool kvmppc_ipi_thread(int cpu) > +{ > + /* On POWER8 for IPIs to threads in the same core, use msgsnd */ > + if (cpu_has_feature(CPU_FTR_ARCH_207S)) { > + preempt_disable(); > + if ((cpu & ~7) == (smp_processor_id() & ~7)) { > + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); > + msg |= cpu & 7; > + smp_mb(); > + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg)); > + preempt_enable(); > + return true; > + } > + preempt_enable(); > + } > + > +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) > + if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) { > + xics_wake_cpu(cpu); > + return true; > + } > +#endif > + > + return false; > +} > + > static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) > { > - int me; > int cpu = vcpu->cpu; > wait_queue_head_t *wqp; > > @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu > *vcpu) > ++vcpu->stat.halt_wakeup; > } > > - me = get_cpu(); > + if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid)) > + return; > > /* CPU points to the first thread of the core */ > - if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) { > -#ifdef CONFIG_PPC_ICP_NATIVE > - int real_cpu = cpu + vcpu->arch.ptid; > - if (paca[real_cpu].kvm_hstate.xics_phys) > - xics_wake_cpu(real_cpu); > - else > -#endif > - if (cpu_online(cpu)) > - smp_send_reschedule(cpu); > - } > - put_cpu(); > + if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu)) > + smp_send_reschedule(cpu); > } > > /* > @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu) > /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */ > smp_wmb(); > tpaca->kvm_hstate.kvm_vcpu = vcpu; > -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP) > if (cpu != smp_processor_id()) > - xics_wake_cpu(cpu); > -#endif > + kvmppc_ipi_thread(cpu); > } > > static void kvmppc_wait_for_nap(void) > diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c > b/arch/powerpc/kvm/book3s_hv_rm_xics.c > index 6dded8c..457a8b1 100644 > --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c > +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include "book3s_xics.h" > > @@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vc
Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
On 20.03.15 10:39, Paul Mackerras wrote: > This creates a debugfs directory for each HV guest (assuming debugfs > is enabled in the kernel config), and within that directory, a file > by which the contents of the guest's HPT (hashed page table) can be > read. The directory is named vm, where is the PID of the > process that created the guest. The file is named "htab". This is > intended to help in debugging problems in the host's management > of guest memory. > > The contents of the file consist of a series of lines like this: > > 3f48 4000d032bf003505 000bd7ff1196 0003b5c71196 > > The first field is the index of the entry in the HPT, the second and > third are the HPT entry, so the third entry contains the real page > number that is mapped by the entry if the entry's valid bit is set. > The fourth field is the guest's view of the second doubleword of the > entry, so it contains the guest physical address. (The format of the > second through fourth fields are described in the Power ISA and also > in arch/powerpc/include/asm/mmu-hash64.h.) > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/include/asm/kvm_book3s_64.h | 2 + > arch/powerpc/include/asm/kvm_host.h | 2 + > arch/powerpc/kvm/book3s_64_mmu_hv.c | 136 > +++ > arch/powerpc/kvm/book3s_hv.c | 12 +++ > virt/kvm/kvm_main.c | 1 + > 5 files changed, 153 insertions(+) > > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h > b/arch/powerpc/include/asm/kvm_book3s_64.h > index 0789a0f..869c53f 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -436,6 +436,8 @@ static inline struct kvm_memslots > *kvm_memslots_raw(struct kvm *kvm) > return rcu_dereference_raw_notrace(kvm->memslots); > } > > +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); > + > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > > #endif /* __ASM_KVM_BOOK3S_64_H__ */ > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 015773f..f1d0bbc 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -238,6 +238,8 @@ struct kvm_arch { > atomic_t hpte_mod_interest; > cpumask_t need_tlb_flush; > int hpt_cma_alloc; > + struct dentry *debugfs_dir; > + struct dentry *htab_dentry; > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE > struct mutex hpt_mutex; > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 6c6825a..d6fe308 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct > kvm_get_htab_fd *ghf) > return ret; > } > > +struct debugfs_htab_state { > + struct kvm *kvm; > + struct mutexmutex; > + unsigned long hpt_index; > + int chars_left; > + int buf_index; > + charbuf[64]; > +}; > + > +static int debugfs_htab_open(struct inode *inode, struct file *file) > +{ > + struct kvm *kvm = inode->i_private; > + struct debugfs_htab_state *p; > + > + p = kzalloc(sizeof(*p), GFP_KERNEL); > + if (!p) > + return -ENOMEM; > + > + kvm_get_kvm(kvm); > + p->kvm = kvm; > + mutex_init(&p->mutex); > + file->private_data = p; > + > + return nonseekable_open(inode, file); > +} > + > +static int debugfs_htab_release(struct inode *inode, struct file *file) > +{ > + struct debugfs_htab_state *p = file->private_data; > + > + kvm_put_kvm(p->kvm); > + kfree(p); > + return 0; > +} > + > +static ssize_t debugfs_htab_read(struct file *file, char __user *buf, > + size_t len, loff_t *ppos) > +{ > + struct debugfs_htab_state *p = file->private_data; > + ssize_t ret, r; > + unsigned long i, n; > + unsigned long v, hr, gr; > + struct kvm *kvm; > + __be64 *hptp; > + > + ret = mutex_lock_interruptible(&p->mutex); > + if (ret) > + return ret; > + > + if (p->chars_left) { > + n = p->chars_left; > + if (n > len) > + n = len; > + r = copy_to_user(buf, p->buf + p->buf_index, n); > + n -= r; > + p->chars_left -= n; > + p->buf_index += n; > + buf += n; > + len -= n; > + ret = n; > + if (r) { > + if (!n) > + ret = -EFAULT; > + goto out; > + } > + } > + > + kvm = p->kvm; > + i = p->hpt_index; > + hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE)); > + for (; len != 0 && i < kvm->arch.
Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
On 20.03.15 10:39, Paul Mackerras wrote: > This reads the timebase at various points in the real-mode guest > entry/exit code and uses that to accumulate total, minimum and > maximum time spent in those parts of the code. Currently these > times are accumulated per vcpu in 5 parts of the code: > > * rm_entry - time taken from the start of kvmppc_hv_entry() until > just before entering the guest. > * rm_intr - time from when we take a hypervisor interrupt in the > guest until we either re-enter the guest or decide to exit to the > host. This includes time spent handling hcalls in real mode. > * rm_exit - time from when we decide to exit the guest until the > return from kvmppc_hv_entry(). > * guest - time spend in the guest > * cede - time spent napping in real mode due to an H_CEDE hcall > while other threads in the same vcore are active. > > These times are exposed in debugfs in a directory per vcpu that > contains a file called "timings". This file contains one line for > each of the 5 timings above, with the name followed by a colon and > 4 numbers, which are the count (number of times the code has been > executed), the total time, the minimum time, and the maximum time, > all in nanoseconds. > > Signed-off-by: Paul Mackerras Have you measure the additional overhead this brings? > --- > arch/powerpc/include/asm/kvm_host.h | 19 + > arch/powerpc/include/asm/time.h | 3 + > arch/powerpc/kernel/asm-offsets.c | 11 +++ > arch/powerpc/kernel/time.c | 6 ++ > arch/powerpc/kvm/book3s_hv.c| 135 > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 - > 6 files changed, 276 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index f1d0bbc..286c0ce 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -369,6 +369,14 @@ struct kvmppc_slb { > u8 base_page_size; /* MMU_PAGE_xxx */ > }; > > +/* Struct used to accumulate timing information in HV real mode code */ > +struct kvmhv_tb_accumulator { > + u64 seqcount; /* used to synchronize access, also count * 2 */ > + u64 tb_total; /* total time in timebase ticks */ > + u64 tb_min; /* min time */ > + u64 tb_max; /* max time */ > +}; > + > # ifdef CONFIG_PPC_FSL_BOOK3E > #define KVMPPC_BOOKE_IAC_NUM 2 > #define KVMPPC_BOOKE_DAC_NUM 2 > @@ -656,6 +664,17 @@ struct kvm_vcpu_arch { > u64 busy_preempt; > > u32 emul_inst; > + > + struct kvmhv_tb_accumulator *cur_activity; /* What we're timing */ > + u64 cur_tb_start; /* when it started */ > + struct kvmhv_tb_accumulator rm_entry; /* real-mode entry code */ > + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */ > + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */ > + struct kvmhv_tb_accumulator guest_time; /* guest execution */ > + struct kvmhv_tb_accumulator cede_time; /* time napping inside guest */ > + > + struct dentry *debugfs_dir; > + struct dentry *debugfs_timings; > #endif > }; > > diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h > index 03cbada..10fc784 100644 > --- a/arch/powerpc/include/asm/time.h > +++ b/arch/powerpc/include/asm/time.h > @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void); > > DECLARE_PER_CPU(u64, decrementers_next_tb); > > +/* Convert timebase ticks to nanoseconds */ > +unsigned long long tb_to_ns(unsigned long long tb_ticks); > + > #endif /* __KERNEL__ */ > #endif /* __POWERPC_TIME_H */ > diff --git a/arch/powerpc/kernel/asm-offsets.c > b/arch/powerpc/kernel/asm-offsets.c > index 4717859..ec9f59c 100644 > --- a/arch/powerpc/kernel/asm-offsets.c > +++ b/arch/powerpc/kernel/asm-offsets.c > @@ -458,6 +458,17 @@ int main(void) > DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1)); > DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2)); > DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3)); > + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry)); > + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr)); > + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit)); > + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time)); > + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time)); > + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity)); > + DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, > arch.cur_tb_start)); > + DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount)); > + DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total)); > + DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min)); > + DEFINE(TAS_MAX
Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
On 20.03.15 10:39, Paul Mackerras wrote: > From: Bharata B Rao > > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU) > correctly, certain work arounds have to be employed to allow reuse of > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such > proposed workaround is to park the vcpu fd in userspace during cpu unplug > and reuse it later during next hotplug. > > More details can be found here: > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html > > In order to support this workaround with PowerPC KVM, don't create or > initialize ICP if the vCPU is found to be already associated with an ICP. > > Signed-off-by: Bharata B Rao > Signed-off-by: Paul Mackerras This probably makes some sense, but please make sure that user space has some way to figure out whether hotplug works at all. Also Paul, for patches that you pick up from others, I'd prefer if they send the patches to the ML themselves first and you pick them up from there then. That way we give everyone the same treatment. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
On 20.03.15 10:39, Paul Mackerras wrote: > This is my current patch queue for HV KVM on PPC. This series is > based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3 > plus a set of recent KVM changes which don't intersect with the > changes in this series. On top of that, in my testing I have some > patches which are not KVM-related but are needed to boot and run a > recent upstream kernel successfully: > > tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop > tick/hotplug: Handover time related duties before cpu offline > powerpc/powernv: Check image loaded or not before calling flash > powerpc/powernv: Fixes for hypervisor doorbell handling > powerpc/powernv: Fix return value from power7_nap() et al. > powerpc: Export __spin_yield > > These patches have been posted by their authors and are on their way > upstream via various trees. They are not included in this series. > > The first three patches are bug fixes that should go into v4.0 if > possible. Thanks, applied the first 3 to my for-4.0 branch which is going through autotest now. If everything runs fine, I'll send it to Paolo for upstream merge. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM
On 16.03.15 21:41, David Gibson wrote: > On Thu, Feb 05, 2015 at 01:57:11AM +0100, Alexander Graf wrote: >> >> >> On 05.02.15 01:53, David Gibson wrote: >>> On POWER, storage caching is usually configured via the MMU - attributes >>> such as cache-inhibited are stored in the TLB and the hashed page table. >>> >>> This makes correctly performing cache inhibited IO accesses awkward when >>> the MMU is turned off (real mode). Some CPU models provide special >>> registers to control the cache attributes of real mode load and stores but >>> this is not at all consistent. This is a problem in particular for SLOF, >>> the firmware used on KVM guests, which runs entirely in real mode, but >>> which needs to do IO to load the kernel. >>> >>> To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD >>> and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to >>> a logical address (aka guest physical address). SLOF uses these for IO. >>> >>> However, because these are implemented within qemu, not the host kernel, >>> these bypass any IO devices emulated within KVM itself. The simplest way >>> to see this problem is to attempt to boot a KVM guest from a virtio-blk >>> device with iothread / dataplane enabled. The iothread code relies on an >>> in kernel implementation of the virtio queue notification, which is not >>> triggered by the IO hcalls, and so the guest will stall in SLOF unable to >>> load the guest OS. >>> >>> This patch addresses this by providing in-kernel implementations of the >>> 2 hypercalls, which correctly scan the KVM IO bus. Any access to an >>> address not handled by the KVM IO bus will cause a VM exit, hitting the >>> qemu implementation as before. >>> >>> Note that a userspace change is also required, in order to enable these >>> new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL. >>> >>> Signed-off-by: David Gibson >> >> Thanks, applied to kvm-ppc-queue. > > Any news on when this might go up to mainline? I'm aiming for 4.1. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "target-ppc: Create versionless CPU class per family if KVM"
On 03.03.15 01:42, Alexey Kardashevskiy wrote: > On 03/03/2015 12:51 AM, Alexander Graf wrote: >> >> >> On 02.03.15 14:42, Andreas Färber wrote: >>> Am 02.03.2015 um 14:37 schrieb Alexander Graf: >>>> On 01.03.15 01:31, Andreas Färber wrote: >>>>> This reverts commit 5b79b1cadd3e565b6d1a5ba59764bd47af58b271 to avoid >>>>> double-registration of types: >>>>> >>>>>Registering `POWER5+-powerpc64-cpu' which already exists >>>>> >>>>> Taking the textual description of a CPU type as part of a new type >>>>> name >>>>> is plain wrong, and so is unconditionally registering a new type here. >>>>> >>>>> Cc: Alexey Kardashevskiy >>>>> Cc: qemu-sta...@nongnu.org >>>>> Signed-off-by: Andreas Färber >>>> >>>> Doesn't this break p8 support? >>> >>> Maybe, but p5 support was in longer and this is definitely a regression >>> and really really wrong. If you know a way to fix it without handing it >>> back to the IBM guys for more thought, feel free to give it a shot. >> >> I honestly don't fully remember what this was about. Wasn't this our >> special KVM class that we use to create a compatible cpu type on the fly? >> >> Alexey, please take a look at it. > > > I sent a note yesterday :-/ Here it is again: > > With this revert, running qemu with HV KVM and -cpu POWER7 fails on real > POWER7 machine as my machine has pvr 003f 0201 and POWER7 is an alias of > POWER7_v2.3 (pvr 003f 0203); and this is what I tried to fix at the > first place. QEMU looks at classes first, and if not found - at aliases, > so this worked. > > I would rename "POWER5+" to "POWER5+_0.0" and make "POWER5+" an alias > for POWER5+_v2.1 (or POWER5+_0.0). Care to send a patch? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings
On 02/19/2015 11:54 AM, Ard Biesheuvel wrote: This is a 0th order approximation of how we could potentially force the guest to avoid uncached mappings, at least from the moment the MMU is on. (Before that, all of memory is implicitly classified as Device-nGnRnE) The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached mappings with cached ones. This way, there is no need to mangle any guest page tables. The downside is that, to do this correctly, we need to always trap writes to the VM sysreg group, which includes registers that the guest may write to very often. To reduce the associated performance hit, patch #1 introduces a fast path for EL2 to perform trivial sysreg writes on behalf of the guest, without the need for a full world switch to the host and back. The main purpose of these patches is to quantify the performance hit, and verify whether the MAIR_EL1 handling works correctly. I gave this a quick spin on a VM running with QEMU. * VGA output is still distorted, I get random junk black lines in the output in between * When I add -device nec-usb-xhci -device usb-kbd the VM doesn't even boot up With TCG, both bits work fine. Alex Ard Biesheuvel (3): arm64: KVM: handle some sysreg writes in EL2 arm64: KVM: mangle MAIR register to prevent uncached guest mappings arm64: KVM: keep trapping of VM sysreg writes enabled arch/arm/kvm/mmu.c | 2 +- arch/arm64/include/asm/kvm_arm.h | 2 +- arch/arm64/kvm/hyp.S | 101 +++ arch/arm64/kvm/sys_regs.c| 63 4 files changed, 156 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "target-ppc: Create versionless CPU class per family if KVM"
On 02.03.15 14:42, Andreas Färber wrote: > Am 02.03.2015 um 14:37 schrieb Alexander Graf: >> On 01.03.15 01:31, Andreas Färber wrote: >>> This reverts commit 5b79b1cadd3e565b6d1a5ba59764bd47af58b271 to avoid >>> double-registration of types: >>> >>> Registering `POWER5+-powerpc64-cpu' which already exists >>> >>> Taking the textual description of a CPU type as part of a new type name >>> is plain wrong, and so is unconditionally registering a new type here. >>> >>> Cc: Alexey Kardashevskiy >>> Cc: qemu-sta...@nongnu.org >>> Signed-off-by: Andreas Färber >> >> Doesn't this break p8 support? > > Maybe, but p5 support was in longer and this is definitely a regression > and really really wrong. If you know a way to fix it without handing it > back to the IBM guys for more thought, feel free to give it a shot. I honestly don't fully remember what this was about. Wasn't this our special KVM class that we use to create a compatible cpu type on the fly? Alexey, please take a look at it. Alex > > Andreas > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models
On 20.02.15 20:43, Michael Mueller wrote: > On Fri, 20 Feb 2015 18:50:20 +0100 > Alexander Graf wrote: > >> >> >> >>> Am 20.02.2015 um 18:37 schrieb Michael Mueller : >>> >>> On Fri, 20 Feb 2015 17:57:52 +0100 >>> Alexander Graf wrote: >>> >>>> Because all CPUs we have in our list only expose 128 bits? >>> >>> Here a STFLE result on a EC12 GA2, already more than 128 bits... Is that >>> model on the list? >> >> If that model has 3 elements, yes, the array should span 3. >> >> I hope it's in the list. Every model wecare about should be, no? >> > > On my list? Yes! > >>> >>> [mimu@p57lp59 s390xfac]$ ./s390xfac -b >>> fac[0] = 0xfbfbfcfff840 >>> fac[1] = 0xffde >>> fac[2] = 0x1800 >>>> >>>>> I want to have this independent from a future machine of the z/Arch. The >>>>> kernel stores the >>>>> full facility set, KVM does and there is no good reason for QEMU not to >>>>> do. If other >>>>> accelerators decide to just implement 64 or 128 bits of facilities that's >>>>> ok... >>>> >>>> So you want to support CPUs that are not part of the list? >>> >>> The architecture at least defines more than 2 or 3. Do you want me to limit >>> it to an arbitrary >>> size?. Only in QEMU or also in the KVM interface? >> >> Only internally in QEMU. The kvm interface should definitely be as big as >> the spec allows! > > Right, now we're on the same page again. That can be taken in consideration. > ... Although it's > just and optimization. :-) Yeah. You could also consider using the QEMU built-in bitmap type and functions and just convert from there. That would give you native support for bit values > 64. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 10/15] cpu-model/s390: Add cpu class initialization routines
> Am 20.02.2015 um 19:59 schrieb Michael Mueller : > > On Fri, 20 Feb 2015 10:11:55 -0800 > Richard Henderson wrote: > >>> +static inline uint64_t big_endian_bit(unsigned long nr) >>> +{ >>> +return 1ul << (BITS_PER_LONG - (nr % BITS_PER_LONG)); >>> +}; >> >> This is buggy. NR=0 should map to 63, not 64. > > I'm sure I was asked to replace my constant 64 and 63 with that defines and > at the end I messed > it up... :-( > >> >>> +return !!(*ptr & big_endian_bit(nr)); >> >> Personally I dislike !! as an idiom. Given that big_endian_bit isn't used >> anywhere else, can we integrate it and change this to >> >> static inline int test_facility(unsigned long nr, uint64_t *fac_list) >> { >> unsigned long word = nr / BITS_PER_LONG; >> unsigned long be_bit = 63 - (nr % BITS_PER_LONG); >> return (fac_list[word] >> be_bit) & 1; >> } > > Yes, I just use it in this context. I will integrate your version. > > BTW I changed the whole facility defining code to be generated by an external > helper at compile > time. That is more simple and safe to change. I will send it with v3. See > attachment for an > example of the generated header file. Please make sure to use ULL with constants and uint64_t on variables. Long is almost always wrong in QEMU. Alex > > Thanks, > Michael > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models
> Am 20.02.2015 um 18:37 schrieb Michael Mueller : > > On Fri, 20 Feb 2015 17:57:52 +0100 > Alexander Graf wrote: > >> Because all CPUs we have in our list only expose 128 bits? > > Here a STFLE result on a EC12 GA2, already more than 128 bits... Is that > model on the list? If that model has 3 elements, yes, the array should span 3. I hope it's in the list. Every model wecare about should be, no? > > [mimu@p57lp59 s390xfac]$ ./s390xfac -b > fac[0] = 0xfbfbfcfff840 > fac[1] = 0xffde > fac[2] = 0x1800 >> >>> I want to have this independent from a future machine of the z/Arch. The >>> kernel stores the >>> full facility set, KVM does and there is no good reason for QEMU not to do. >>> If other >>> accelerators decide to just implement 64 or 128 bits of facilities that's >>> ok... >> >> So you want to support CPUs that are not part of the list? > > The architecture at least defines more than 2 or 3. Do you want me to limit > it to an arbitrary > size?. Only in QEMU or also in the KVM interface? Only internally in QEMU. The kvm interface should definitely be as big as the spec allows! Alex > > Thanks > Michael > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 13/15] cpu-model/s390: Add processor property routines
On 20.02.15 16:32, Michael Mueller wrote: > On Fri, 20 Feb 2015 15:03:30 +0100 > Alexander Graf wrote: > >>> >>> - s390_get_proceccor_props() >>> - s390_set_proceccor_props() >>> >>> They can be used to request or retrieve processor related information from >>> an accelerator. >>> That information comprises the cpu identifier, the ICB value and the >>> facility lists. >>> >>> Signed-off-by: Michael Mueller >> >> Hrm, I still seem to miss the point of this interface. What do you need >> it for? > > These functions make the internal s390 cpu model API independent from a > specific accelerator: > > int s390_set_processor_props(S390ProcessorProps *prop) > { > if (kvm_enabled()) { > return kvm_s390_set_processor_props(prop); > } > return -ENOSYS; > } > > It's called by: > > s390_select_cpu_model(const char *model) > > which is itself called by: > > S390CPU *cpu_s390x_init(const char *cpu_model) > { > S390CPU *cpu; > > cpu = S390_CPU(object_new(s390_select_cpu_model(cpu_model))); > > object_property_set_bool(OBJECT(cpu), true, "realized", NULL); > > return cpu; > } > > So above s390_set/get_processor_props() the code is accelerator independent. Any particular reason you can't do it like PPC? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 09/15] cpu-model/s390: Add KVM VM attribute interface routines
On 20.02.15 16:18, Michael Mueller wrote: > On Fri, 20 Feb 2015 14:59:20 +0100 > Alexander Graf wrote: > >>> +typedef struct S390ProcessorProps { >>> +uint64_t cpuid; >>> +uint16_t ibc; >>> +uint8_t pad[6]; >>> +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64]; >>> +} S390ProcessorProps; >>> + >>> +typedef struct S390MachineProps { >>> +uint64_t cpuid; >>> +uint32_t ibc_range; >>> +uint8_t pad[4]; >>> +uint64_t fac_list_mask[S390_ARCH_FAC_LIST_SIZE_UINT64]; >>> +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64]; >>> +} S390MachineProps; >> >> What are those structs there for? To convert between a kvm facing >> interface to an internal interface? > > Yes, that's their current use, but if the interface structs: > > +struct kvm_s390_vm_cpu_processor { > + __u64 cpuid; > + __u16 ibc; > + __u8 pad[6]; > + __u64 fac_list[256]; > +}; > + > +/* kvm S390 machine related attributes are r/o */ > +#define KVM_S390_VM_CPU_MACHINE1 > +struct kvm_s390_vm_cpu_machine { > + __u64 cpuid; > + __u32 ibc_range; > + __u8 pad[4]; > + __u64 fac_mask[256]; > + __u64 fac_list[256]; > +}; > > are visible here, I'll reuse them... But stop, that will not work in the > --disable-kvm case... I need them! I meant it the other way around - do KVM specific patching of the cpu types from kvm.c. But please give a nutshell explanation on what exactly you're patching at all here. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models
On 20.02.15 16:49, Michael Mueller wrote: > On Fri, 20 Feb 2015 16:22:20 +0100 > Alexander Graf wrote: > >>>> >>>> Just make this uint64_t fac_list[2]. That way we don't have to track any >>>> messy allocations. >>> >>> It will be something like "uint64_t >>> fac_list[S390_CPU_FAC_LIST_SIZE_UINT64]" and in total 2KB >>> not just 16 bytes but I will change it. >> >> Why? Do we actually need that many? This is a qemu internal struct. > > How do you know that 2 is a good size? Because all CPUs we have in our list only expose 128 bits? > I want to have this independent from a future machine of the z/Arch. The > kernel stores the full > facility set, KVM does and there is no good reason for QEMU not to do. If > other accelerators > decide to just implement 64 or 128 bits of facilities that's ok... So you want to support CPUs that are not part of the list? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models
> Am 20.02.2015 um 16:00 schrieb Michael Mueller : > > On Fri, 20 Feb 2015 14:54:23 +0100 > Alexander Graf wrote: > >>> >>> +/* machine related properties */ >>> +typedef struct S390CPUMachineProps { >>> +uint16_t class; /* machine class */ >>> +uint16_t ga; /* availability number of machine */ >>> +uint16_t order; /* order of availability */ >>> +} S390CPUMachineProps; >>> + >>> +/* processor related properties */ >>> +typedef struct S390CPUProcessorProps { >>> +uint16_t gen;/* S390 CMOS generation */ >>> +uint16_t ver;/* version of processor */ >>> +uint32_t id; /* processor identification*/ >>> +uint16_t type; /* machine type */ >>> +uint16_t ibc;/* IBC value */ >>> +uint64_t *fac_list; /* list of facilities */ >> >> Just make this uint64_t fac_list[2]. That way we don't have to track any >> messy allocations. > > It will be something like "uint64_t fac_list[S390_CPU_FAC_LIST_SIZE_UINT64]" > and in total 2KB not > just 16 bytes but I will change it. Why? Do we actually need that many? This is a qemu internal struct. Alex-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 13/15] cpu-model/s390: Add processor property routines
On 17.02.15 15:24, Michael Mueller wrote: > This patch implements the functions: > > - s390_get_proceccor_props() > - s390_set_proceccor_props() > > They can be used to request or retrieve processor related information from an > accelerator. > That information comprises the cpu identifier, the ICB value and the facility > lists. > > Signed-off-by: Michael Mueller Hrm, I still seem to miss the point of this interface. What do you need it for? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 09/15] cpu-model/s390: Add KVM VM attribute interface routines
On 17.02.15 15:24, Michael Mueller wrote: > The patch implements routines to set and retrieve processor configuration > data and to retrieve machine configuration data. The machine related data > is used together with the cpu model facility lists to determine the list of > supported cpu models of this host. The above mentioned routines have QEMU > trace point instrumentation. > > Signed-off-by: Michael Mueller > --- > target-s390x/cpu-models.h | 39 ++ > target-s390x/kvm.c| 102 > ++ > trace-events | 3 ++ > 3 files changed, 144 insertions(+) > > diff --git a/target-s390x/cpu-models.h b/target-s390x/cpu-models.h > index 623a7b2..76b3456 100644 > --- a/target-s390x/cpu-models.h > +++ b/target-s390x/cpu-models.h > @@ -45,6 +45,45 @@ typedef struct S390CPUAlias { > char *model; > } S390CPUAlias; > > +typedef struct S390ProcessorProps { > +uint64_t cpuid; > +uint16_t ibc; > +uint8_t pad[6]; > +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64]; > +} S390ProcessorProps; > + > +typedef struct S390MachineProps { > +uint64_t cpuid; > +uint32_t ibc_range; > +uint8_t pad[4]; > +uint64_t fac_list_mask[S390_ARCH_FAC_LIST_SIZE_UINT64]; > +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64]; > +} S390MachineProps; What are those structs there for? To convert between a kvm facing interface to an internal interface? I don't think they're necessary. The internal layout is visible from the KVM code. Just either spawn the class straight from the kvm file or if you consider that ugly, pass the values of that struct that you need as function parameters to a function in cpu-models.c. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models
On 17.02.15 15:24, Michael Mueller wrote: > This patch implements the static part of the s390 cpu class definitions. > It defines s390 cpu models by means of virtual cpu ids (enum) which contain > information on the cpu generation, the machine class, the GA number and > the machine type. The cpu id is used to instantiate a cpu class per cpu > model. > > In addition the patch introduces the QMP enumeration AccelId. It is used > to index certain cpu model poperties per accelerator. > > Furthermore it extends the existing S390CPUClass by model related properties. > > Signed-off-by: Michael Mueller > Reviewed-by: Thomas Huth > --- > qapi-schema.json | 11 +++ > target-s390x/Makefile.objs | 1 + > target-s390x/cpu-models.c | 79 > ++ > target-s390x/cpu-models.h | 71 + > target-s390x/cpu-qom.h | 22 + > target-s390x/cpu.c | 2 ++ > 6 files changed, 186 insertions(+) > create mode 100644 target-s390x/cpu-models.c > create mode 100644 target-s390x/cpu-models.h > > diff --git a/qapi-schema.json b/qapi-schema.json > index e16f8eb..4d237c8 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -2473,6 +2473,17 @@ > ## > { 'command': 'query-machines', 'returns': ['MachineInfo'] } > > + > +## > +# @AccelId > +# > +# Defines accelerator ids > +# > +# Since: 2.3.0 > +## > +{ 'enum': 'AccelId', > + 'data': ['qtest', 'tcg', 'kvm', 'xen' ] } > + > ## > # @CpuDefinitionInfo: > # > diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs > index 2c57494..9f55140 100644 > --- a/target-s390x/Makefile.objs > +++ b/target-s390x/Makefile.objs > @@ -1,5 +1,6 @@ > obj-y += translate.o helper.o cpu.o interrupt.o > obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o > obj-y += gdbstub.o > +obj-y += cpu-models.o > obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o > obj-$(CONFIG_KVM) += kvm.o > diff --git a/target-s390x/cpu-models.c b/target-s390x/cpu-models.c > new file mode 100644 > index 000..4841553 > --- /dev/null > +++ b/target-s390x/cpu-models.c > @@ -0,0 +1,79 @@ > +/* > + * CPU models for s390 > + * > + * Copyright 2014,2015 IBM Corp. > + * > + * Author(s): Michael Mueller > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or (at > + * your option) any later version. See the COPYING file in the top-level > + * directory. > + */ > + > +#include "qemu-common.h" > +#include "cpu-models.h" > + > +#define S390_PROC_DEF(_name, _cpu_id, _desc)\ > +static void \ > +glue(_cpu_id, _cpu_class_init) \ > +(ObjectClass *oc, void *data) \ > +{ \ > +DeviceClass *dc = DEVICE_CLASS(oc); \ > +S390CPUClass *cc = S390_CPU_CLASS(oc); \ > +\ > +cc->is_active[ACCEL_ID_KVM] = true; \ > +cc->mach= g_malloc0(sizeof(S390CPUMachineProps)); \ > +cc->mach->ga= cpu_ga(_cpu_id); \ > +cc->mach->class = cpu_class(_cpu_id); \ > +cc->mach->order = cpu_order(_cpu_id); \ > +cc->proc= g_malloc0(sizeof(S390CPUProcessorProps)); \ > +cc->proc->gen = cpu_generation(_cpu_id); \ > +cc->proc->ver = S390_DEF_VERSION; \ > +cc->proc->id= S390_DEF_ID; \ > +cc->proc->type = cpu_type(_cpu_id);\ > +cc->proc->ibc = S390_DEF_IBC; \ > +dc->desc= _desc;\ > +} \ > +static const TypeInfo \ > +glue(_cpu_id, _cpu_type_info) = { \ > +.name = _name "-" TYPE_S390_CPU, \ > +.parent = TYPE_S390_CPU,\ > +.class_init = glue(_cpu_id, _cpu_class_init), \ > +}; \ > +static void \ > +glue(_cpu_id, _cpu_register_types)(void)\ > +{ \ > +type_register_static( \ > +&glue(_cpu_id, _cpu_type_info));
Re: [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models
On 17.02.15 15:24, Michael Mueller wrote: > This patch implements the static part of the s390 cpu class definitions. > It defines s390 cpu models by means of virtual cpu ids (enum) which contain > information on the cpu generation, the machine class, the GA number and > the machine type. The cpu id is used to instantiate a cpu class per cpu > model. > > In addition the patch introduces the QMP enumeration AccelId. It is used > to index certain cpu model poperties per accelerator. > > Furthermore it extends the existing S390CPUClass by model related properties. > > Signed-off-by: Michael Mueller > Reviewed-by: Thomas Huth > --- > qapi-schema.json | 11 +++ > target-s390x/Makefile.objs | 1 + > target-s390x/cpu-models.c | 79 > ++ > target-s390x/cpu-models.h | 71 + > target-s390x/cpu-qom.h | 22 + > target-s390x/cpu.c | 2 ++ > 6 files changed, 186 insertions(+) > create mode 100644 target-s390x/cpu-models.c > create mode 100644 target-s390x/cpu-models.h > > diff --git a/qapi-schema.json b/qapi-schema.json > index e16f8eb..4d237c8 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -2473,6 +2473,17 @@ > ## > { 'command': 'query-machines', 'returns': ['MachineInfo'] } > > + > +## > +# @AccelId > +# > +# Defines accelerator ids > +# > +# Since: 2.3.0 > +## > +{ 'enum': 'AccelId', > + 'data': ['qtest', 'tcg', 'kvm', 'xen' ] } > + > ## > # @CpuDefinitionInfo: > # > diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs > index 2c57494..9f55140 100644 > --- a/target-s390x/Makefile.objs > +++ b/target-s390x/Makefile.objs > @@ -1,5 +1,6 @@ > obj-y += translate.o helper.o cpu.o interrupt.o > obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o > obj-y += gdbstub.o > +obj-y += cpu-models.o > obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o > obj-$(CONFIG_KVM) += kvm.o > diff --git a/target-s390x/cpu-models.c b/target-s390x/cpu-models.c > new file mode 100644 > index 000..4841553 > --- /dev/null > +++ b/target-s390x/cpu-models.c > @@ -0,0 +1,79 @@ > +/* > + * CPU models for s390 > + * > + * Copyright 2014,2015 IBM Corp. > + * > + * Author(s): Michael Mueller > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or (at > + * your option) any later version. See the COPYING file in the top-level > + * directory. > + */ > + > +#include "qemu-common.h" > +#include "cpu-models.h" > + > +#define S390_PROC_DEF(_name, _cpu_id, _desc)\ > +static void \ > +glue(_cpu_id, _cpu_class_init) \ > +(ObjectClass *oc, void *data) \ > +{ \ > +DeviceClass *dc = DEVICE_CLASS(oc); \ > +S390CPUClass *cc = S390_CPU_CLASS(oc); \ > +\ > +cc->is_active[ACCEL_ID_KVM] = true; \ > +cc->mach= g_malloc0(sizeof(S390CPUMachineProps)); \ > +cc->mach->ga= cpu_ga(_cpu_id); \ > +cc->mach->class = cpu_class(_cpu_id); \ > +cc->mach->order = cpu_order(_cpu_id); \ > +cc->proc= g_malloc0(sizeof(S390CPUProcessorProps)); \ > +cc->proc->gen = cpu_generation(_cpu_id); \ > +cc->proc->ver = S390_DEF_VERSION; \ > +cc->proc->id= S390_DEF_ID; \ > +cc->proc->type = cpu_type(_cpu_id);\ > +cc->proc->ibc = S390_DEF_IBC; \ > +dc->desc= _desc;\ > +} \ > +static const TypeInfo \ > +glue(_cpu_id, _cpu_type_info) = { \ > +.name = _name "-" TYPE_S390_CPU, \ > +.parent = TYPE_S390_CPU,\ > +.class_init = glue(_cpu_id, _cpu_class_init), \ > +}; \ > +static void \ > +glue(_cpu_id, _cpu_register_types)(void)\ > +{ \ > +type_register_static( \ > +&glue(_cpu_id, _cpu_type_info));
Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings
On 19.02.15 15:56, Ard Biesheuvel wrote: > On 19 February 2015 at 14:50, Alexander Graf wrote: >> >> >> On 19.02.15 11:54, Ard Biesheuvel wrote: >>> This is a 0th order approximation of how we could potentially force the >>> guest >>> to avoid uncached mappings, at least from the moment the MMU is on. (Before >>> that, all of memory is implicitly classified as Device-nGnRnE) >>> >>> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached >>> mappings >>> with cached ones. This way, there is no need to mangle any guest page >>> tables. >> >> Would you mind to give a brief explanation on what this does? What >> happens to actually assigned devices that need to be mapped as uncached? >> What happens to DMA from such devices when the guest assumes that it's >> accessing RAM uncached and then triggers DMA? >> > > On ARM, stage 2 mappings that are more strict will supersede stage 1 > mappings, so the idea is to use cached mappings exclusively for stage > 1 so that the host is fully in control of the actual memory attributes > by setting the attributes at stage 2. This also makes sense because > the host will ultimately know better whether some range that the guest > thinks is a device is actually a device or just emulated (no stage 2 > mapping), backed by host memory (such as the NOR flash read case) or > backed by a passthrough device. Ok, so that means if the guest maps RAM as uncached, it will actually end up as cached memory. Now if the guest triggers a DMA request to a passed through device to that RAM, it will conflict with the cache. I don't know whether it's a big deal, but it's the scenario that came up with the approach above before when I talked to people about it. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings
On 19.02.15 11:54, Ard Biesheuvel wrote: > This is a 0th order approximation of how we could potentially force the guest > to avoid uncached mappings, at least from the moment the MMU is on. (Before > that, all of memory is implicitly classified as Device-nGnRnE) > > The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached > mappings > with cached ones. This way, there is no need to mangle any guest page tables. Would you mind to give a brief explanation on what this does? What happens to actually assigned devices that need to be mapped as uncached? What happens to DMA from such devices when the guest assumes that it's accessing RAM uncached and then triggers DMA? Alex > > The downside is that, to do this correctly, we need to always trap writes to > the VM sysreg group, which includes registers that the guest may write to very > often. To reduce the associated performance hit, patch #1 introduces a fast > path > for EL2 to perform trivial sysreg writes on behalf of the guest, without the > need for a full world switch to the host and back. > > The main purpose of these patches is to quantify the performance hit, and > verify whether the MAIR_EL1 handling works correctly. > > Ard Biesheuvel (3): > arm64: KVM: handle some sysreg writes in EL2 > arm64: KVM: mangle MAIR register to prevent uncached guest mappings > arm64: KVM: keep trapping of VM sysreg writes enabled > > arch/arm/kvm/mmu.c | 2 +- > arch/arm64/include/asm/kvm_arm.h | 2 +- > arch/arm64/kvm/hyp.S | 101 > +++ > arch/arm64/kvm/sys_regs.c| 63 > 4 files changed, 156 insertions(+), 12 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: H_CLEAR_REF and H_CLEAR_MOD
> Am 18.02.2015 um 07:12 schrieb Nathan Whitehorn : > > It seems like KVM doesn't implement the H_CLEAR_REF and H_CLEAR_MOD > hypervisor calls, which are absolutely critical for memory management in the > FreeBSD kernel (and are marked "mandatory" in the PAPR manual). It seems some > patches have been contributed already in > https://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/095013.html, so > it would be fantastic if these could end up upstream. Paul, I guess we never included this because there was no user. If FreeBSD does use it though, I think it makes a lot of sense to resend it for inclusion. > > I'm going to try to get some kind of workaround in the meantime so we can at > least run on existing kernels. Please don't add hacks in FreeBSD only because kvm is missing a feature. Let's just get this done properly :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
VFIO iommu page size masking
Hi Alex, While trying to get VFIO-PCI working on AArch64 (with 64k page size), I stumbled over the following piece of code: > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) > { > struct vfio_domain *domain; > unsigned long bitmap = PAGE_MASK; > > mutex_lock(&iommu->lock); > list_for_each_entry(domain, &iommu->domain_list, next) > bitmap &= domain->domain->ops->pgsize_bitmap; > mutex_unlock(&iommu->lock); > > return bitmap; > } The SMMU page mask is [3.054302] arm-smmu e0a0.smmu: Supported page sizes: 0x40201000 but after this function, we end up supporting one 2MB pages and above. The reason for that is simple: You restrict the bitmap to PAGE_MASK and above. Now the big question is why you're doing that. I don't see why it would be a problem if the IOMMU maps a page in smaller chunks. So I tried to patch the code above with s/PAGE_MASK/1UL/ and everything seems to run fine. But maybe we're not lacking some sanity checks? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html