Re: [GIT PULL 3/3] KVM: s390: use simple switch statement as multiplexer

2015-10-29 Thread Alexander Graf

> Am 29.10.2015 um 16:08 schrieb Christian Borntraeger :
> 
> We currently do some magic shifting (by exploiting that exit codes
> are always a multiple of 4) and a table lookup to jump into the
> exit handlers. This causes some calculations and checks, just to
> do an potentially expensive function call.
> 
> Changing that to a switch statement gives the compiler the chance
> to inline and dynamically decide between jump tables or inline
> compare and branches. In addition it makes the code more readable.
> 
> bloat-o-meter gives me a small reduction in code size:
> 
> add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348)
> function old new   delta
> kvm_handle_sie_intercept  721058+986
> handle_prog  704 696  -8
> handle_noop   54   - -54
> handle_partial_execution  60   - -60
> intercept_funcs  120   --120
> handle_instruction   198   --198
> handle_validity  210   --210
> handle_stop  316   --316
> handle_external_interrupt368   --368
> 
> Right now my gcc does conditional branches instead of jump tables.
> The inlining seems to give us enough cycles as some micro-benchmarking
> shows minimal improvements, but still in noise.

Awesome. I ended up with the same conclusions on switch vs table lookups in the 
ppc code back in the day.

> 
> Signed-off-by: Christian Borntraeger 
> Reviewed-by: Cornelia Huck 
> ---
> arch/s390/kvm/intercept.c | 42 +-
> 1 file changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index 7365e8a..b4a5aa1 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -336,28 +336,28 @@ static int handle_partial_execution(struct kvm_vcpu 
> *vcpu)
>return -EOPNOTSUPP;
> }
> 
> -static const intercept_handler_t intercept_funcs[] = {
> -[0x00 >> 2] = handle_noop,
> -[0x04 >> 2] = handle_instruction,
> -[0x08 >> 2] = handle_prog,
> -[0x10 >> 2] = handle_noop,
> -[0x14 >> 2] = handle_external_interrupt,
> -[0x18 >> 2] = handle_noop,
> -[0x1C >> 2] = kvm_s390_handle_wait,
> -[0x20 >> 2] = handle_validity,
> -[0x28 >> 2] = handle_stop,
> -[0x38 >> 2] = handle_partial_execution,
> -};
> -
> int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
> {
> -intercept_handler_t func;
> -u8 code = vcpu->arch.sie_block->icptcode;
> -
> -if (code & 3 || (code >> 2) >= ARRAY_SIZE(intercept_funcs))
> +switch (vcpu->arch.sie_block->icptcode) {
> +case 0x00:
> +case 0x10:
> +case 0x18:

... if you could convert these magic numbers to something more telling however, 
I think readability would improve even more! That can easily be a follow up 
patch though.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-09-12 Thread Alexander Graf


> Am 12.09.2015 um 18:47 schrieb Nathan Whitehorn :
> 
>> On 09/06/15 16:52, Paul Mackerras wrote:
>>> On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote:
>>> Anything I can do to help move these along? It's a big performance
>>> improvement for FreeBSD guests.
>> These patches are in Paolo's kvm-ppc-next branch and should go into
>> Linus' tree in the next couple of days.
>> 
>> Paul.
> 
> One additional question. What is your preferred way to enable these? Since 
> these are part of the mandatory part of the PAPR spec, I think there's an 
> argument to add them to the default_hcall_list? Otherwise, they should be 
> enabled by default in QEMU (I can take care of sending that patch if you 
> prefer this route).

The default hcall list just describes which hcalls were implicitly enabled at 
the point in time we made them enableable by user space. IMHO no new hcalls 
should get added there.

So yes, please send a patch to qemu :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Alexander Graf


On 04.09.15 11:59, Christian Borntraeger wrote:
> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>
>>  Hi all,
>>
>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>> too, it seems like we easily can hit the amount of KVM-internal memory
>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>> arch/powerpc/include/asm/kvm_host.h). For example, start
>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>> you'll see that it aborts way earlier already.
>>
>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>> already (+3 internal slots = 512) ... maybe we should now increase the
>> amount of slots on powerpc, too? Since we don't use internal slots on
>> POWER, would 512 be a good value? Or would less be sufficient, too?
> 
> When you are at it, the s390 value should also be increased I guess.

That constant defines the array size for the memslot array in struct kvm
which in turn again gets allocated by kzalloc, so it's pinned kernel
memory that is physically contiguous. Doing big allocations can turn
into problems during runtime.

So maybe there is another way? Can we extend the memslot array size
dynamically somehow? Allocate it separately? How much memory does the
memslot array use up with 512 entries?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Alexander Graf


> Am 02.09.2015 um 09:26 schrieb Thomas Huth :
> 
>> On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
>>> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>>> wrote:
 On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
> The size of the Problem State Priority Boost Register is only
> 32 bits, so let's change the type of the corresponding variable
> accordingly to avoid future trouble.
 
 It's not future trouble, it's broken today for LE and this should
 fix
 it BUT 
>>> 
>>> No, it's broken today for BE hosts, which will always see 0 for the
>>> PSPB register value.  LE hosts are fine.
> 
> Right ... I just meant that nobody really experienced trouble with this
> today yet, but the bug is already present now already of course.

Sounds like a great candidate for kvm-unit-tests then, no? ;)


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio: Enable VFIO device for powerpc

2015-08-26 Thread Alexander Graf


On 13.08.15 03:15, David Gibson wrote:
> ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is
> used to handle any necessary interactions between KVM and VFIO.
> 
> Currently that device is built on x86 and ARM, but not powerpc, although
> powerpc does support both KVM and VFIO.  This makes things awkward in
> userspace
> 
> Currently qemu prints an alarming error message if you attempt to use VFIO
> and it can't initialize the KVM VFIO device.  We don't want to remove the
> warning, because lack of the KVM VFIO device could mean coherency problems
> on x86.  On powerpc, however, the error is harmless but looks disturbing,
> and a test based on host architecture in qemu would be ugly, and break if
> we do need the KVM VFIO device for something important in future.
> 
> There's nothing preventing the KVM VFIO device from being built for
> powerpc, so this patch turns it on.  It won't actually do anything, since
> we don't define any of the arch_*() hooks, but it will make qemu happy and
> we can extend it in future if we need to.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Eric Auger 

Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm
patches should get CC on the kvm-ppc@vger mailing list ;).

Paul, could you please pick this one up?


Thanks!

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 00/12] ppc patch queue 2015-08-22

2015-08-23 Thread Alexander Graf


On 22.08.15 15:32, Paolo Bonzini wrote:
> 
> 
> On 22/08/2015 02:21, Alexander Graf wrote:
>> Hi Paolo,
>>
>> This is my current patch queue for ppc.  Please pull.
> 
> Done, but this queue has not been in linux-next.  Please push to
> kvm-ppc-next on your github Linux tree as well; please keep an eye on

Ah, sorry. I pushed to kvm-ppc-next in parallel to sending the request.

> Steven Rothwell's messages in the next few days, and I'll send the pull
> request sometimes next week via webmail if everything goes fine.

Nothing exciting came in so far, so I hope we're good :).


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 08/12] KVM: PPC: Book3S HV: Fix bug in dirty page tracking

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

This fixes a bug in the tracking of pages that get modified by the
guest.  If the guest creates a large-page HPTE, writes to memory
somewhere within the large page, and then removes the HPTE, we only
record the modified state for the first normal page within the large
page, when in fact the guest might have modified some other normal
page within the large page.

To fix this we use some unused bits in the rmap entry to record the
order (log base 2) of the size of the page that was modified, when
removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
order to return the correct number of modified pages.

The same thing could in principle happen when removing a HPTE at the
host's request, i.e. when paging out a page, except that we never
page out large pages, and the guest can only create large-page HPTEs
if the guest RAM is backed by large pages.  However, we also fix
this case for the sake of future-proofing.

The reference bit is also subject to the same loss of information.  We
don't make the same fix here for the reference bit because there isn't
an interface for userspace to find out which pages the guest has
referenced, whereas there is one for userspace to find out which pages
the guest has modified.  Because of this loss of information, the
kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
say that a page has not been referenced when it has, but that doesn't
matter greatly because we never page or swap out large pages.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |  1 +
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  8 +++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 17 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b91e74a..e6b2534 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -158,6 +158,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t 
gpa, bool writing,
bool *writable);
 extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long 
psize);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index);
 void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 80eb29a..e187b6a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -205,8 +205,10 @@ struct revmap_entry {
  */
 #define KVMPPC_RMAP_LOCK_BIT   63
 #define KVMPPC_RMAP_RC_SHIFT   32
+#define KVMPPC_RMAP_CHG_SHIFT  48
 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R << KVMPPC_RMAP_RC_SHIFT)
 #define KVMPPC_RMAP_CHANGED(HPTE_R_C << KVMPPC_RMAP_RC_SHIFT)
+#define KVMPPC_RMAP_CHG_ORDER  (0x3ful << KVMPPC_RMAP_CHG_SHIFT)
 #define KVMPPC_RMAP_PRESENT0x1ul
 #define KVMPPC_RMAP_INDEX  0xul
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dab68b7..1f9c0a1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -761,6 +761,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
/* Harvest R and C */
rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+   if (rcbits & HPTE_R_C)
+   kvmppc_update_rmap_change(rmapp, psize);
if (rcbits & ~rev[i].guest_rpte) {
rev[i].guest_rpte = ptel | rcbits;
note_hpte_modification(kvm, &rev[i]);
@@ -927,8 +929,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
  retry:
lock_rmap(rmapp);
if (*rmapp & KVMPPC_RMAP_CHANGED) {
-   *rmapp &= ~KVMPPC_RMAP_CHANGED;
+   long change_order = (*rmapp & KVMPPC_RMAP_CHG_ORDER)
+   >> KVMPPC_RMAP_CHG_SHIFT;
+   *rmapp &= ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER);
npages_dirty = 1;
+   if (change_order > PAGE_SHIFT)
+   npages_dirty = 1ul << (change_order - PAGE_SHIFT);
}
if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c6d601c..c7a3ab2 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/k

[PULL 03/12] KVM: PPC: Fix warnings from sparse

2015-08-22 Thread Alexander Graf
From: Thomas Huth 

When compiling the KVM code for POWER with "make C=1", sparse
complains about functions missing proper prototypes and a 64-bit
constant missing the ULL prefix. Let's fix this by making the
functions static or by including the proper header with the
prototypes, and by appending a ULL prefix to the constant
PPC_MPPE_ADDRESS_MASK.

Signed-off-by: Thomas Huth 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/ppc-opcode.h| 2 +-
 arch/powerpc/kvm/book3s.c| 3 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_64_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_emulate.c| 1 +
 arch/powerpc/kvm/book3s_hv.c | 8 
 arch/powerpc/kvm/book3s_paired_singles.c | 2 +-
 arch/powerpc/kvm/powerpc.c   | 2 +-
 8 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 8452335..790f5d1 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -287,7 +287,7 @@
 
 /* POWER8 Micro Partition Prefetch (MPP) parameters */
 /* Address mask is common for LOGMPP instruction and MPPR SPR */
-#define PPC_MPPE_ADDRESS_MASK 0xc000
+#define PPC_MPPE_ADDRESS_MASK 0xc000ULL
 
 /* Bits 60 and 61 of MPP SPR should be set to one of the following */
 /* Aborting the fetch is indeed setting 00 in the table size bits */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 05ea8fc..53285d5 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -240,7 +240,8 @@ void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, 
ulong flags)
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE);
 }
 
-int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu,
+unsigned int priority)
 {
int deliver = 1;
int vec = 0;
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 2035d16..d5c9bfe 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include "book3s.h"
 
 /* #define DEBUG_MMU */
 /* #define DEBUG_SR */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b982d92..79ad35a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include "trace_pr.h"
+#include "book3s.h"
 
 #define PTE_SIZE 12
 
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 5a2bc4b..2afdb9c 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include "book3s.h"
 
 #define OP_19_XOP_RFID 18
 #define OP_19_XOP_RFI  50
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067a..6e588ac 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -214,12 +214,12 @@ static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 
msr)
kvmppc_end_cede(vcpu);
 }
 
-void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
 {
vcpu->arch.pvr = pvr;
 }
 
-int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
+static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 {
unsigned long pcr = 0;
struct kvmppc_vcore *vc = vcpu->arch.vcore;
@@ -259,7 +259,7 @@ int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 
arch_compat)
return 0;
 }
 
-void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
+static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 {
int r;
 
@@ -292,7 +292,7 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
   vcpu->arch.last_inst);
 }
 
-struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
+static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
int r;
struct kvm_vcpu *v, *ret = NULL;
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c 
b/arch/powerpc/kvm/book3s_paired_singles.c
index bd6ab16..a759d9a 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -352,7 +352,7 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb)
return kvmppc_get_field(inst, msb + 32, lsb + 32);
 }
 
-bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
+static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
 {
if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
return false;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index e5dd

[PULL 01/12] KVM: PPC: fix suspicious use of conditional operator

2015-08-22 Thread Alexander Graf
From: Tudor Laurentiu 

This was signaled by a static code analysis tool.

Signed-off-by: Laurentiu Tudor 
Reviewed-by: Scott Wood 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/e500_mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index 50860e9..29911a0 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -377,7 +377,7 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea)
| MAS0_NV(vcpu_e500->gtlb_nv[tlbsel]);
vcpu->arch.shared->mas1 =
  (vcpu->arch.shared->mas6 & MAS6_SPID0)
-   | (vcpu->arch.shared->mas6 & (MAS6_SAS ? MAS1_TS : 0))
+   | ((vcpu->arch.shared->mas6 & MAS6_SAS) ? MAS1_TS : 0)
| (vcpu->arch.shared->mas4 & MAS4_TSIZED(~0));
vcpu->arch.shared->mas2 &= MAS2_EPN;
vcpu->arch.shared->mas2 |= vcpu->arch.shared->mas4 &
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 02/12] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig

2015-08-22 Thread Alexander Graf
From: Thomas Huth 

Since the PPC970 support has been removed from the kvm-hv kernel
module recently, we should also reflect this change in the help
text of the corresponding Kconfig option.

Signed-off-by: Thomas Huth 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/Kconfig | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..c2024ac 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -74,14 +74,14 @@ config KVM_BOOK3S_64
  If unsure, say N.
 
 config KVM_BOOK3S_64_HV
-   tristate "KVM support for POWER7 and PPC970 using hypervisor mode in 
host"
+   tristate "KVM for POWER7 and later using hypervisor mode in host"
depends on KVM_BOOK3S_64 && PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
---help---
  Support running unmodified book3s_64 guest kernels in
- virtual machines on POWER7 and PPC970 processors that have
+ virtual machines on POWER7 and newer processors that have
  hypervisor mode available to the host.
 
  If you say Y here, KVM will use the hardware virtualization
@@ -89,8 +89,8 @@ config KVM_BOOK3S_64_HV
  guest operating systems will run at full hardware speed
  using supervisor and user modes.  However, this also means
  that KVM is not usable under PowerVM (pHyp), is only usable
- on POWER7 (or later) processors and PPC970-family processors,
- and cannot emulate a different processor from the host processor.
+ on POWER7 or later processors, and cannot emulate a
+ different processor from the host processor.
 
  If unsure, say N.
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 05/12] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu->arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Reviewed-by: David Gibson 
Tested-by: Laurent Vivier 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  19 +-
 arch/powerpc/kernel/asm-offsets.c   |   2 +
 arch/powerpc/kvm/book3s_hv.c| 333 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c|   7 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   5 +
 6 files changed, 298 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..2b74490 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,7 +278,9 @@ struct kvmppc_vcore {
u16 last_cpu;
u8 vcore_state;
u8 in_guest;
+   struct kvmppc_vcore *master_vcore;
struct list_head runnable_threads;
+   struct list_head preempt_list;
spinlock_t lock;
wait_queue_head_t wq;
spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
@@ -300,12 +302,18 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
-/* Values for vcore_state */
+/*
+ * Values for vcore_state.
+ * Note that these are arranged such that lower values
+ * (< VCORE_SLEEPING) don't require stolen time accounting
+ * on load/unload, and higher values do.
+ */
 #define VCORE_INACTIVE 0
-#define VCORE_SLEEPING 1
-#define VCORE_PREEMPT  2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_PREEMPT  1
+#define VCORE_PIGGYBACK2
+#define VCORE_SLEEPING 3
+#define VCORE_RUNNING  4
+#define VCORE_EXITING  5
 
 /*
  * Struct used to manage memory for a virtual processor area
@@ -619,6 +627,7 @@ struct kvm_vcpu_arch {
int trap;
int state;
int ptid;
+   int thread_cpu;
bool timer_running;
wait_queue_head_t cpu_run;
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9823057..a78cdbf 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -512,6 +512,8 @@ int main(void)
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst));
+   DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu));
+   DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6e588ac..0173ce2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 #define MPP_BUFFER_ORDER   3
 #endif
 
+static int target_smt_mode;
+module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int cpu = vcpu->cpu;
+   int cpu;
wait_queue_head_t *wqp;
 
wqp = kvm_arch_vcpu_

[PULL 04/12] KVM: PPC: add missing pt_regs initialization

2015-08-22 Thread Alexander Graf
From: Tudor Laurentiu 

On this switch branch the regs initialization
doesn't happen so add it.
This was found with the help of a static
code analysis tool.

Signed-off-by: Laurentiu Tudor 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/booke.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index cc58426..ae458f0 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -933,6 +933,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
 #endif
break;
case BOOKE_INTERRUPT_CRITICAL:
+   kvmppc_fill_pt_regs(®s);
unknown_exception(®s);
break;
case BOOKE_INTERRUPT_DEBUG:
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 12/12] KVM: PPC: Book3S: correct width in XER handling

2015-08-22 Thread Alexander Graf
From: Sam bobroff 

In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
accessed as such.

This patch corrects places where it is accessed as a 32 bit field by a
64 bit kernel.  In some cases this is via a 32 bit load or store
instruction which, depending on endianness, will cause either the
lower or upper 32 bits to be missed.  In another case it is cast as a
u32, causing the upper 32 bits to be cleared.

This patch corrects those places by extending the access methods to
64 bits.

Signed-off-by: Sam Bobroff 
Reviewed-by: Laurent Vivier 
Reviewed-by: Thomas Huth 
Tested-by: Thomas Huth 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h | 4 ++--
 arch/powerpc/include/asm/kvm_book3s_asm.h | 2 +-
 arch/powerpc/include/asm/kvm_booke.h  | 4 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 6 +++---
 arch/powerpc/kvm/book3s_segment.S | 4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index e6b2534..9fac01c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -226,12 +226,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
return vcpu->arch.cr;
 }
 
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
vcpu->arch.xer = val;
 }
 
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.xer;
 }
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 57d5dfe..72b6225 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -132,7 +132,7 @@ struct kvmppc_book3s_shadow_vcpu {
bool in_use;
ulong gpr[14];
u32 cr;
-   u32 xer;
+   ulong xer;
ulong ctr;
ulong lr;
ulong pc;
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index 3286f0d..bc6e29e 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
return vcpu->arch.cr;
 }
 
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
vcpu->arch.xer = val;
 }
 
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.xer;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index e347766..472680f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -944,7 +944,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
blt hdec_soon
 
ld  r6, VCPU_CTR(r4)
-   lwz r7, VCPU_XER(r4)
+   ld  r7, VCPU_XER(r4)
 
mtctr   r6
mtxer   r7
@@ -1181,7 +1181,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mfctr   r3
mfxer   r4
std r3, VCPU_CTR(r9)
-   stw r4, VCPU_XER(r9)
+   std r4, VCPU_XER(r9)
 
/* If this is a page table miss then see if it's theirs or ours */
cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
@@ -1763,7 +1763,7 @@ kvmppc_hdsi:
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
-   lwz r8, VCPU_XER(r9)
+   ld  r8, VCPU_XER(r9)
mtctr   r7
mtxer   r8
mr  r4, r9
diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index acee37c..ca8f174 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -123,7 +123,7 @@ no_dcbz32_on:
PPC_LL  r8, SVCPU_CTR(r3)
PPC_LL  r9, SVCPU_LR(r3)
lwz r10, SVCPU_CR(r3)
-   lwz r11, SVCPU_XER(r3)
+   PPC_LL  r11, SVCPU_XER(r3)
 
mtctr   r8
mtlrr9
@@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
mfctr   r8
mflrr9
 
-   stw r5, SVCPU_XER(r13)
+   PPC_STL r5, SVCPU_XER(r13)
PPC_STL r6, SVCPU_FAULT_DAR(r13)
stw r7, SVCPU_FAULT_DSISR(r13)
PPC_STL r8, SVCPU_CTR(r13)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 09/12] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests.  These implementations
have been tested using a FreeBSD guest.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   4 +-
 2 files changed, 121 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c7a3ab2..c1df9bb 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, 
unsigned long psize)
 }
 EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change);
 
+/* Returns a pointer to the revmap entry for the page mapped by a HPTE */
+static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
+ unsigned long hpte_gr)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long *rmap;
+   unsigned long gfn;
+
+   gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr));
+   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
+   if (!memslot)
+   return NULL;
+
+   rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
+   return rmap;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
unsigned long hpte_v, unsigned long hpte_r)
 {
struct revmap_entry *next, *prev;
-   unsigned long gfn, ptel, head;
-   struct kvm_memory_slot *memslot;
+   unsigned long ptel, head;
unsigned long *rmap;
unsigned long rcbits;
 
rcbits = hpte_r & (HPTE_R_R | HPTE_R_C);
ptel = rev->guest_rpte |= rcbits;
-   gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
-   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
-   if (!memslot)
+   rmap = revmap_for_hpte(kvm, hpte_v, ptel);
+   if (!rmap)
return;
-
-   rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
lock_rmap(rmap);
 
head = *rmap & KVMPPC_RMAP_INDEX;
@@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
return H_SUCCESS;
 }
 
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu->kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index >= kvm->arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   v = be64_to_cpu(hpte[0]);
+   r = be64_to_cpu(hpte[1]);
+   if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+   goto out;
+
+   gr = rev->guest_rpte;
+   if (rev->guest_rpte & HPTE_R_R) {
+   rev->guest_rpte &= ~HPTE_R_R;
+   note_hpte_modification(kvm, rev);
+   }
+   if (v & HPTE_V_VALID) {
+   gr |= r & (HPTE_R_R | HPTE_R_C);
+   if (r & HPTE_R_R) {
+   kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
+   rmap = revmap_for_hpte(kvm, v, gr);
+   if (rmap) {
+   lock_rmap(rmap);
+   *rmap |= KVMPPC_RMAP_REFERENCED;
+   unlock_rmap(rmap);
+   }
+   }
+   }
+   vcpu->arch.gpr[4] = gr;
+   ret = H_SUCCESS;
+ out:
+   unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+   return ret;
+}
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu->kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index >= kvm->arch.hpt_npte)
+   return H_

[PULL 10/12] KVM: PPC: Book3S HV: Fix preempted vcore list locking

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

When a vcore gets preempted, we put it on the preempted vcore list for
the current CPU.  The runner task then calls schedule() and comes back
some time later and takes itself off the list.  We need to be careful
to lock the list that it was put onto, which may not be the list for the
current CPU since the runner task may have moved to another CPU.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6e3ef30..3d02276 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore 
*vc)
 
 static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc)
 {
-   struct preempted_vcore_list *lp = this_cpu_ptr(&preempted_vcores);
+   struct preempted_vcore_list *lp;
 
kvmppc_core_end_stolen(vc);
if (!list_empty(&vc->preempt_list)) {
+   lp = &per_cpu(preempted_vcores, vc->pcpu);
spin_lock(&lp->lock);
list_del_init(&vc->preempt_list);
spin_unlock(&lp->lock);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 11/12] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
time for it.  This currently isn't the case when we have a vcore that
no longer has any runnable threads in it but still has a runner task,
so we do an explicit call to kvmppc_core_start_stolen() in that case.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3d02276..fad52f2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
}
list_del_init(&vc->preempt_list);
if (!is_master) {
-   vc->vcore_state = vc->runner ? VCORE_PREEMPT : VCORE_INACTIVE;
-   if (still_running > 0)
+   if (still_running > 0) {
kvmppc_vcore_preempt(vc);
+   } else if (vc->runner) {
+   vc->vcore_state = VCORE_PREEMPT;
+   kvmppc_core_start_stolen(vc);
+   } else {
+   vc->vcore_state = VCORE_INACTIVE;
+   }
if (vc->n_runnable > 0 && vc->runner == NULL) {
/* make sure there's a candidate runner awake */
vcpu = list_first_entry(&vc->runnable_threads,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 06/12] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 367 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 473 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..57d5dfe 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_SMT_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_SMT_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-o

[PULL 00/12] ppc patch queue 2015-08-22

2015-08-22 Thread Alexander Graf
Hi Paolo,

This is my current patch queue for ppc.  Please pull.

Alex


The following changes since commit 4d283ec908e617fa28bcb06bce310206f0655d67:

  x86/kvm: Rename VMX's segment access rights defines (2015-08-15 00:47:13 
+0200)

are available in the git repository at:

  git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-next

for you to fetch changes up to c63517c2e3810071359af926f621c1f784388c3f:

  KVM: PPC: Book3S: correct width in XER handling (2015-08-22 11:16:19 +0200)


Patch queue for ppc - 2015-08-22

Highlights for KVM PPC this time around:

  - Book3S: A few bug fixes
  - Book3S: Allow micro-threading on POWER8


Paul Mackerras (7):
  KVM: PPC: Book3S HV: Make use of unused threads when running guests
  KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
  KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
  KVM: PPC: Book3S HV: Fix bug in dirty page tracking
  KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
  KVM: PPC: Book3S HV: Fix preempted vcore list locking
  KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation

Sam bobroff (1):
  KVM: PPC: Book3S: correct width in XER handling

Thomas Huth (2):
  KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig
  KVM: PPC: Fix warnings from sparse

Tudor Laurentiu (2):
  KVM: PPC: fix suspicious use of conditional operator
  KVM: PPC: add missing pt_regs initialization

 arch/powerpc/include/asm/kvm_book3s.h |   5 +-
 arch/powerpc/include/asm/kvm_book3s_asm.h |  22 +-
 arch/powerpc/include/asm/kvm_booke.h  |   4 +-
 arch/powerpc/include/asm/kvm_host.h   |  24 +-
 arch/powerpc/include/asm/ppc-opcode.h |   2 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/Kconfig  |   8 +-
 arch/powerpc/kvm/book3s.c |   3 +-
 arch/powerpc/kvm/book3s_32_mmu_host.c |   1 +
 arch/powerpc/kvm/book3s_64_mmu_host.c |   1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   8 +-
 arch/powerpc/kvm/book3s_emulate.c |   1 +
 arch/powerpc/kvm/book3s_hv.c  | 660 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 161 +++-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 128 +-
 arch/powerpc/kvm/book3s_paired_singles.c  |   2 +-
 arch/powerpc/kvm/book3s_segment.S |   4 +-
 arch/powerpc/kvm/booke.c  |   1 +
 arch/powerpc/kvm/e500_mmu.c   |   2 +-
 arch/powerpc/kvm/powerpc.c|   2 +-
 22 files changed, 938 insertions(+), 146 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 07/12] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-08-22 Thread Alexander Graf
From: Paul Mackerras 

The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265fb
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index b027a89..c6d601c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
v = pte & ~HPTE_V_HVLOCK;
if (v & HPTE_V_VALID) {
-   u64 pte1;
-
-   pte1 = be64_to_cpu(hpte[1]);
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
-   rb = compute_tlbie_rb(v, pte1, pte_index);
+   rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
-   /* Read PTE low word after tlbie to get final R/C values */
-   remove_revmap_chain(kvm, pte_index, rev, v, pte1);
+   /*
+* The reference (R) and change (C) bits in a HPT
+* entry can be set by hardware at any time up until
+* the HPTE is invalidated and the TLB invalidation
+* sequence has completed.  This means that when
+* removing a HPTE, we need to re-read the HPTE after
+* the invalidation sequence has completed in order to
+* obtain reliable values of R and C.
+*/
+   remove_revmap_chain(kvm, pte_index, rev, v,
+   be64_to_cpu(hpte[1]));
}
r = rev->guest_rpte & ~HPTE_GR_RESERVED;
note_hpte_modification(kvm, rev);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm:powerpc:Fix incorrect return statement in the function mpic_set_default_irq_routing

2015-08-12 Thread Alexander Graf


On 12.08.15 21:06, nick wrote:
> 
> 
> On 2015-08-12 03:05 PM, Alexander Graf wrote:
>>
>>
>> On 07.08.15 17:54, Nicholas Krause wrote:
>>> This fixes the incorrect return statement in the function
>>> mpic_set_default_irq_routing from always returning zero
>>> to signal success to this function's caller to instead
>>> return the return value of kvm_set_irq_routing as this
>>> function can fail and we need to correctly signal the
>>> caller of mpic_set_default_irq_routing when the call
>>> to this particular function has failed.
>>>
>>> Signed-off-by: Nicholas Krause 
>>
>> I like the patch, but I don't see it on the kvm-ppc mailing list. It
>> doesn't show up on patchwork or spinics. Did something go wrong while
>> sending it out?
>>
>>
>> Alex
>>
> Alex,
> Ask Paolo about it as he would be able to explain it better then I.

Well, whatever the reason, I can only apply patches that actually
appeared on the public mailing list. Otherwise people may not get the
chance to review them ;).


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c

2015-08-12 Thread Alexander Graf


On 10.08.15 17:27, Nicholas Krause wrote:
> This fixes the wrapper functions kvm_umap_hva_hv and the function
> kvm_unmap_hav_range_hv to return the return value of the function
> kvm_handle_hva or kvm_handle_hva_range that they are wrapped to
> call internally rather then always making the caller of these
> wrapper functions think they always run successfully by returning
> the value of zero directly.
> 
> Signed-off-by: Nicholas Krause 

Paul, could you please take on this one?

Thanks,

Alex

> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index dab68b7..0905c8f 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -774,14 +774,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned 
> long *rmapp,
>  
>  int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
>  {
> - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
> - return 0;
> + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
>  }
>  
>  int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned 
> long end)
>  {
> - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
> - return 0;
> + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
>  }
>  
>  void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm:powerpc:Fix incorrect return statement in the function mpic_set_default_irq_routing

2015-08-12 Thread Alexander Graf


On 07.08.15 17:54, Nicholas Krause wrote:
> This fixes the incorrect return statement in the function
> mpic_set_default_irq_routing from always returning zero
> to signal success to this function's caller to instead
> return the return value of kvm_set_irq_routing as this
> function can fail and we need to correctly signal the
> caller of mpic_set_default_irq_routing when the call
> to this particular function has failed.
> 
> Signed-off-by: Nicholas Krause 

I like the patch, but I don't see it on the kvm-ppc mailing list. It
doesn't show up on patchwork or spinics. Did something go wrong while
sending it out?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 11/14] powerpc/ppc64: add rtas_power_off

2015-08-03 Thread Alexander Graf


On 03.08.15 19:02, Andrew Jones wrote:
> On Mon, Aug 03, 2015 at 07:08:17PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 03/08/2015 16:41, Andrew Jones wrote:
>>> Add enough RTAS support to support power-off, and apply it to
>>> exit().
>>>
>>> Signed-off-by: Andrew Jones 
>>
>> Why not use virtio-mmio + testdev on ppc as well?  Similar to how we're
>> not using PSCI on ARM or ACPI on x86.
> 
> I have some longer term plans to add minimal virtio-pci support to
> kvm-unit-tests, and then we could plug virtio-serial+chr-testdev into
> that. I didn't think I could use virtio-mmio directly with spapr, but
> maybe I can? Actually, I sort of like this approach more in some

You would need to add support for the dynamic sysbus device allocation
in the spapr machine, but then I don't see why it wouldn't work.

PCI however is the more natural choice on sPAPR if you want to do virtio.

That said, if all you need is a chr transport, IIRC there should be a
way to get you additional channels on the existing "serial port" - which
really is just a simply hypercall interface. But David is the best
person to guide you to the best path forward here.


Alex

> respects though, as it doesn't require a special testdev or virtio
> support, keeping the unit test extra minimal. In fact, I was even
> thinking about posting patches (which I've already written) that
> allow chr-testdev to be optional for ARM too, now that it could use
> the exitcode snooper.
> 
> Thanks,
> drew
> 
>>
>> Paolo
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-23 Thread Alexander Graf


On 20.07.15 08:49, David Gibson wrote:
> On Thu, Jul 16, 2015 at 05:11:12PM +1000, Paul Mackerras wrote:
>> This series contains two fixes for the new dynamic micro-threading
>> code that was added recently for HV-mode KVM on Power servers.
>> The patches are against Alex Graf's kvm-ppc-queue branch.  Please
>> apply.
> 
> agraf,
> 
> Any word on these?  These appear to fix a really nasty host crash in
> current upstream.  I'd really like to see them merged ASAP.

Thanks, applied to kvm-ppc-queue.

The host crash should only occur with dynamic micro-threading enabled,
which is not in Linus' tree, correct?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] PPC: Current patch queue for HV KVM

2015-07-01 Thread Alexander Graf


On 24.06.15 13:18, Paul Mackerras wrote:
> This is my current queue of patches for HV KVM.  This series is based
> on the kvm next branch.  They have all been posted 6 weeks ago or
> more, though I have just added a 3-line fix to patch 2/5 to fix a bug
> that we found in testing migration, and I expanded a comment (no code
> change) in patch 3/5 following a suggestion by Aneesh.
> 
> I'd like to see these go into 4.2 if possible.

Thanks, applied all to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-30 Thread Alexander Graf

On 06/24/15 13:18, Paul Mackerras wrote:

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras 
---
  arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
  arch/powerpc/include/asm/kvm_host.h   |   3 +
  arch/powerpc/kernel/asm-offsets.c |   7 +
  arch/powerpc/kvm/book3s_hv.c  | 369 ++
  arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
  6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
  #define XICS_MFRR 0xc
  #define XICS_IPI  2   /* interrupt source # for IPIs */
  
+/* Maximum number of threads per physical core */

+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
  #ifdef __ASSEMBLY__
  
  #ifdef CONFIG_KVM_BOOK3S_HANDLER

@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
  
  #else  /*__ASSEMBLY__ */
  
+struct kvmppc_vcore;

+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
  /*
   * This struct goes in the PACA on 64-bit processors.  It is used
   * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
  #endif
  #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
  #define VCORE_EXIT_MAP(vc)((vc)->entry_exit_map >> 8)
  #define VCORE_IS_EXITING(vc)  (VCORE_EXIT_MAP(vc) != 0)
  
+/* This bit is used when a vcore exit is triggered from outside the vcore */

+#define VCORE_EXIT_REQ 0x1
+
  /*
   * Values for vcore_state.
   * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/

Re: [PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Alexander Graf


On 17.06.15 12:15, Will Deacon wrote:
> On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote:
>> Instead of referring to the Linux header including the barrier
>> macros, copy over the rather simple implementation for the PowerPC
>> barrier instructions kvmtool uses. This fixes build for powerpc.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>> Hi,
>>
>> I just took what kvmtool seems to have used before, I actually have
>> no idea if "sync" is the right instruction or "lwsync" would do.
>> Would be nice if some people with PowerPC knowledge could comment.
> 
> I *think* we can use lwsync for rmb and wmb, but would want confirmation
> from a ppc guy before making that change!

Also I'd prefer to play safe for now :)


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: add missing pt_regs initialization

2015-05-25 Thread Alexander Graf


On 18.05.15 14:44, Laurentiu Tudor wrote:
> On this switch branch the regs initialization
> doesn't happen so add it.
> This was found with the help of a static
> code analysis tool.
> 
> Signed-off-by: Laurentiu Tudor 
> Cc: Scott Wood 
> Cc: Mihai Caraman 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL

2015-05-25 Thread Alexander Graf


On 21.05.15 21:37, Scott Wood wrote:
> On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote:
>> If passed a larger page size lookup_linux_ptep()
>> may fail, so add a check for that and bail out
>> if that's the case.
>> This was found with the help of a static
>> code analysis tool.
>>
>> Signed-off-by: Mihai Caraman 
>> Signed-off-by: Laurentiu Tudor 
>> Cc: Scott Wood 
>> ---
>> based on https://github.com/agraf/linux-2.6.git kvm-ppc-next
>>
>>  arch/powerpc/kvm/e500_mmu_host.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Scott Wood 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig

2015-05-25 Thread Alexander Graf


On 22.05.15 11:41, Thomas Huth wrote:
> Since the PPC970 support has been removed from the kvm-hv kernel
> module recently, we should also reflect this change in the help
> text of the corresponding Kconfig option.
> 
> Signed-off-by: Thomas Huth 

Thanks, applied to kvm-ppc-queue.

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Fix warnings from sparse

2015-05-25 Thread Alexander Graf


On 22.05.15 09:25, Thomas Huth wrote:
> When compiling the KVM code for POWER with "make C=1", sparse
> complains about functions missing proper prototypes and a 64-bit
> constant missing the ULL prefix. Let's fix this by making the
> functions static or by including the proper header with the
> prototypes, and by appending a ULL prefix to the constant
> PPC_MPPE_ADDRESS_MASK.
> 
> Signed-off-by: Thomas Huth 

Thanks, applied to kvm-ppc-queue.

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig

2015-05-25 Thread Alexander Graf


On 22.05.15 11:41, Thomas Huth wrote:
> Since the PPC970 support has been removed from the kvm-hv kernel
> module recently, we should also reflect this change in the help
> text of the corresponding Kconfig option.
> 
> Signed-off-by: Thomas Huth 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: fix suspicious use of conditional operator

2015-05-25 Thread Alexander Graf


On 25.05.15 10:48, Laurentiu Tudor wrote:
> This was signaled by a static code analysis tool.
> 
> Signed-off-by: Laurentiu Tudor 
> Reviewed-by: Scott Wood 

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case

2015-05-09 Thread Alexander Graf


On 29.04.15 06:49, Paul Mackerras wrote:
> This fixes a regression introduced in commit 25fedfca94cf, "KVM: PPC:
> Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
> leads to a user-triggerable oops.
> 
> In the case where we try to run a vcore on a physical core that is
> not in single-threaded mode, or the vcore has too many threads for
> the physical core, we iterate the list of runnable vcpus to make
> each one return an EBUSY error to userspace.  Since this involves
> taking each vcpu off the runnable_threads list for the vcore, we
> need to use list_for_each_entry_safe rather than list_for_each_entry
> to traverse the list.  Otherwise the kernel will crash with an oops
> message like this:
> 
> Unable to handle kernel paging request for data at address 0x000fff88
> Faulting instruction address: 0xd0001e635dc8
> Oops: Kernel access of bad area, sig: 11 [#2]
> SMP NR_CPUS=1024 NUMA PowerNV
> ...
> CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G  D3.18.0 #1
> task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000
> NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50
> REGS: c027d19275b0 TRAP: 0300   Tainted: G  D (3.18.0)
> MSR: 90009033   CR: 22002824  XER: 
> CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1
> GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001
> GPR04: 0001 0001  
> GPR08: 00200200   d0001e63e588
> GPR12: 2200 c7dbc800 c00fc780 000a
> GPR16: fffc c00fd5439690 c00fc7801c98 0001
> GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350
> GPR24:  c00fa57f 0030 
> GPR28: fff0 c00fd543b328 000fe468 c00fd543b300
> NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
> LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
> Call Trace:
> [c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] 
> (unreliable)
> [c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
> [c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
> [c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
> [c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
> [c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780
> [c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0
> [c027d1927e30] [c0009358] syscall_exit+0x0/0x98
> Instruction dump:
> 6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018
> ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc
> ---[ end trace 8cdf50251cca6680 ]---
> 
> Fixes: 25fedfca94cf
> Signed-off-by: Paul Mackerras 

Reviewed-by: Alexander Graf 

Paolo, can you please take this patch into 4.1 directly?


Thanks a lot,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-27 Thread Alexander Graf

On 04/27/2015 03:57 PM, Martin Schwidefsky wrote:

On Mon, 27 Apr 2015 15:48:42 +0200
Alexander Graf  wrote:


On 04/23/2015 02:13 PM, Martin Schwidefsky wrote:

On Thu, 23 Apr 2015 14:01:23 +0200
Alexander Graf  wrote:


As far as alternative approaches go, I don't have a great idea otoh.
We could have an elf flag indicating that this process needs 4k page
tables to limit the impact to a single process. In fact, could we
maybe still limit the scope to non-global? A personality may work
as well. Or ulimit?

I tried the ELF flag approach, does not work. The trouble is that
allocate_mm() has to create the page tables with 4K tables if you
want to change the page table layout later on. We have learned the
hard way that the direction 2K to 4K does not work due to races
in the mm.

Now there are two major cases: 1) fork + execve and 2) fork only.
The ELF flag can be used to reduce from 4K to 2K for 1) but not 2).
2) is required for apps that use lots of forking, e.g. database or
web servers. Same goes for the approach with a personality flag or
ulimit.

We would have to distinguish the two cases for allocate_mm(),
if the new mm is allocated for a fork the current mm decides
2K vs. 4K. If the new mm is allocated by binfmt_elf, then start
with 4K and do the downgrade after the ELF flag has been evaluated.

Well, you could also make it a personality flag for example, no? Then
every new process below a certain one always gets 4k page tables until
they drop the personality, at which point each child would only get 2k
page tables again.

I'm mostly concerned that people will end up mixing VMs and other
workloads on the same LPAR, so I don't think there's a one-shoe-fits-all
solution.

If I add an argument to mm_init() to indicate if this context
is for fork() or execve() then the ELF header flag approach works.


So you don't need the sysctl?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-27 Thread Alexander Graf

On 04/23/2015 02:08 PM, Christian Borntraeger wrote:

Am 23.04.2015 um 14:01 schrieb Alexander Graf:



Am 23.04.2015 um 13:43 schrieb Christian Borntraeger :


Am 23.04.2015 um 13:37 schrieb Alexander Graf:



Am 23.04.2015 um 13:08 schrieb Christian Borntraeger :

From: Martin Schwidefsky 

Replacing a 2K page table with a 4K page table while a VMA is active
for the affected memory region is fundamentally broken. Rip out the
page table reallocation code and replace it with a simple system
control 'vm.allocate_pgste'. If the system control is set the page
tables for all processes are allocated as full 4K pages, even for
processes that do not need it.

Signed-off-by: Martin Schwidefsky 
Signed-off-by: Christian Borntraeger 

Couldn't you make this a hidden kconfig option that gets automatically selected 
when kvm is enabled? Or is there a non-kvm case that needs it too?

For things like RHEV the default could certainly be "enabled", but for normal
distros like SLES/RHEL, the idea was to NOT enable that by default, as the 
non-KVM
case is more common and might suffer from the additional memory consumption of
the page tables. (big databases come to mind)

We could think about having rpms like kvm to provide a sysctl file that sets it 
if we
want to minimize the impact. Other ideas?

Oh, I'm sorry, I misread the ifdef. I don't think it makes sense to have a 
config option for the default value then, just rely only on sysctl.conf for 
changed defaults.

As far as mechanisms to change it go, every distribution has their own ways of dealing 
with this. RH has a "profile" thing, we don't really have anything central, but 
individual sysctl.d files for example that a kvm package could provide.
Either way, the default choosing shouldn't happen in .config ;).

So you vote for getting rid of the Kconfig?

Also, please add some helpful error message in qemu to guide users to the 
sysctl.

Yes, we will provide a qemu patch (cc stable) after this hits the kernel.


As far as alternative approaches go, I don't have a great idea otoh. We could 
have an elf flag indicating that this process needs 4k page tables to limit the 
impact to a single process.

This approach was actually Martins first fix. The problem is that the decision 
takes place on execve,
but we need an answer at fork time. So we always started with 4k page tables 
and freed the 2nd halv on
execve. Now this did not work for processes that only fork (without execve).


In fact, could we maybe still limit the scope to non-global? A personality may 
work as well. Or ulimit?

I think we will go for now with the sysctl and see if we can come up with some 
automatic way as additional
patch later on.


Sounds perfectly reasonable to me. You can for example also just set the 
sysctl bit in libvirtd :).



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-27 Thread Alexander Graf

On 04/23/2015 02:13 PM, Martin Schwidefsky wrote:

On Thu, 23 Apr 2015 14:01:23 +0200
Alexander Graf  wrote:


As far as alternative approaches go, I don't have a great idea otoh.
We could have an elf flag indicating that this process needs 4k page
tables to limit the impact to a single process. In fact, could we
maybe still limit the scope to non-global? A personality may work
as well. Or ulimit?

I tried the ELF flag approach, does not work. The trouble is that
allocate_mm() has to create the page tables with 4K tables if you
want to change the page table layout later on. We have learned the
hard way that the direction 2K to 4K does not work due to races
in the mm.

Now there are two major cases: 1) fork + execve and 2) fork only.
The ELF flag can be used to reduce from 4K to 2K for 1) but not 2).
2) is required for apps that use lots of forking, e.g. database or
web servers. Same goes for the approach with a personality flag or
ulimit.

We would have to distinguish the two cases for allocate_mm(),
if the new mm is allocated for a fork the current mm decides
2K vs. 4K. If the new mm is allocated by binfmt_elf, then start
with 4K and do the downgrade after the ELF flag has been evaluated.


Well, you could also make it a personality flag for example, no? Then 
every new process below a certain one always gets 4k page tables until 
they drop the personality, at which point each child would only get 2k 
page tables again.


I'm mostly concerned that people will end up mixing VMs and other 
workloads on the same LPAR, so I don't think there's a one-shoe-fits-all 
solution.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-23 Thread Alexander Graf


> Am 23.04.2015 um 13:43 schrieb Christian Borntraeger :
> 
>> Am 23.04.2015 um 13:37 schrieb Alexander Graf:
>> 
>> 
>>> Am 23.04.2015 um 13:08 schrieb Christian Borntraeger 
>>> :
>>> 
>>> From: Martin Schwidefsky 
>>> 
>>> Replacing a 2K page table with a 4K page table while a VMA is active
>>> for the affected memory region is fundamentally broken. Rip out the
>>> page table reallocation code and replace it with a simple system
>>> control 'vm.allocate_pgste'. If the system control is set the page
>>> tables for all processes are allocated as full 4K pages, even for
>>> processes that do not need it.
>>> 
>>> Signed-off-by: Martin Schwidefsky 
>>> Signed-off-by: Christian Borntraeger 
>> 
>> Couldn't you make this a hidden kconfig option that gets automatically 
>> selected when kvm is enabled? Or is there a non-kvm case that needs it too?
> 
> For things like RHEV the default could certainly be "enabled", but for normal
> distros like SLES/RHEL, the idea was to NOT enable that by default, as the 
> non-KVM
> case is more common and might suffer from the additional memory consumption of
> the page tables. (big databases come to mind)
> 
> We could think about having rpms like kvm to provide a sysctl file that sets 
> it if we
> want to minimize the impact. Other ideas?

Oh, I'm sorry, I misread the ifdef. I don't think it makes sense to have a 
config option for the default value then, just rely only on sysctl.conf for 
changed defaults.

As far as mechanisms to change it go, every distribution has their own ways of 
dealing with this. RH has a "profile" thing, we don't really have anything 
central, but individual sysctl.d files for example that a kvm package could 
provide.

Either way, the default choosing shouldn't happen in .config ;). Also, please 
add some helpful error message in qemu to guide users to the sysctl.

As far as alternative approaches go, I don't have a great idea otoh. We could 
have an elf flag indicating that this process needs 4k page tables to limit the 
impact to a single process. In fact, could we maybe still limit the scope to 
non-global? A personality may work as well. Or ulimit?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-23 Thread Alexander Graf


> Am 23.04.2015 um 13:08 schrieb Christian Borntraeger :
> 
> From: Martin Schwidefsky 
> 
> Replacing a 2K page table with a 4K page table while a VMA is active
> for the affected memory region is fundamentally broken. Rip out the
> page table reallocation code and replace it with a simple system
> control 'vm.allocate_pgste'. If the system control is set the page
> tables for all processes are allocated as full 4K pages, even for
> processes that do not need it.
> 
> Signed-off-by: Martin Schwidefsky 
> Signed-off-by: Christian Borntraeger 

Couldn't you make this a hidden kconfig option that gets automatically selected 
when kvm is enabled? Or is there a non-kvm case that needs it too?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 09/21] KVM: PPC: Book3S HV: Add ICP real mode counters

2015-04-21 Thread Alexander Graf
From: Suresh Warrier 

Add two counters to count how often we generate real-mode ICS resend
and reject events. The counters provide some performance statistics
that could be used in the future to consider if the real mode functions
need further optimizing. The counters are displayed as part of IPC and
ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.

Also added two counters that count (approximately) how many times we
don't find an ICP or ICS we're looking for. These are not currently
exposed through sysfs, but can be useful when debugging crashes.

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  7 +++
 arch/powerpc/kvm/book3s_xics.c   | 10 --
 arch/powerpc/kvm/book3s_xics.h   |  5 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 73bbe92..6dded8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
ics = kvmppc_xics_find_ics(xics, new_irq, &src);
if (!ics) {
/* Unsafe increment, but this does not need to be accurate */
+   xics->err_noics++;
return;
}
state = &ics->irq_state[src];
@@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
icp = kvmppc_xics_find_server(xics->kvm, state->server);
if (!icp) {
/* Unsafe increment again*/
+   xics->err_noicp++;
goto out;
}
}
@@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 * separately here as well.
 */
if (resend) {
+   icp->n_check_resend++;
icp_rm_check_resend(xics, icp);
}
 }
@@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long 
server,
 
/* Handle reject in real mode */
if (reject && reject != XICS_IPI) {
+   this_icp->n_reject++;
icp_rm_deliver_irq(xics, icp, reject);
}
 
/* Handle resends in real mode */
if (resend) {
+   this_icp->n_check_resend++;
icp_rm_check_resend(xics, icp);
}
 
@@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long 
cppr)
 * attempt (see comments in icp_rm_deliver_irq).
 */
if (reject && reject != XICS_IPI) {
+   icp->n_reject++;
icp_rm_deliver_irq(xics, icp, reject);
}
  bail:
@@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long 
xirr)
 
/* Still asserted, resend it */
if (state->asserted) {
+   icp->n_reject++;
icp_rm_deliver_irq(xics, icp, irq);
}
 
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 5f7beebd..8f3e6cc 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
unsigned long flags;
unsigned long t_rm_kick_vcpu, t_rm_check_resend;
unsigned long t_rm_reject, t_rm_notify_eoi;
+   unsigned long t_reject, t_check_resend;
 
if (!kvm)
return 0;
@@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
t_rm_notify_eoi = 0;
t_rm_check_resend = 0;
t_rm_reject = 0;
+   t_check_resend = 0;
+   t_reject = 0;
 
seq_printf(m, "=\nICP state\n=\n");
 
@@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
t_rm_notify_eoi += icp->n_rm_notify_eoi;
t_rm_check_resend += icp->n_rm_check_resend;
t_rm_reject += icp->n_rm_reject;
+   t_check_resend += icp->n_check_resend;
+   t_reject += icp->n_reject;
}
 
-   seq_puts(m, "ICP Guest Real Mode exit totals: ");
-   seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu 
notify_eoi=%lu\n",
+   seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu 
reject=%lu notify_eoi=%lu\n",
t_rm_kick_vcpu, t_rm_check_resend,
t_rm_reject, t_rm_notify_eoi);
+   seq_printf(m, "ICP Real Mode totals: check_resend=%lu resend=%lu\n",
+   t_check_resend, t_reject);
for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
struct kvmppc_ics *ics = xics->ics[icsid];
 
diff --g

[PULL 07/21] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock

2015-04-21 Thread Alexander Graf
From: Suresh Warrier 

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_xics.c | 68 +-
 arch/powerpc/kvm/book3s_xics.h |  2 +-
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 60bdbac..5f7beebd 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,7 +40,7 @@
  * LOCKING
  * ===
  *
- * Each ICS has a mutex protecting the information about the IRQ
+ * Each ICS has a spin lock protecting the information about the IRQ
  * sources and avoiding simultaneous deliveries if the same interrupt.
  *
  * ICP operations are done via a single compare & swap transaction
@@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
 {
int i;
 
-   mutex_lock(&ics->lock);
+   unsigned long flags;
+
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
 
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = &ics->irq_state[i];
@@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
XICS_DBG("resend %#x prio %#x\n", state->number,
  state->priority);
 
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
icp_deliver_irq(xics, icp, state->number);
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
}
 
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
   u32 server, u32 priority, u32 saved_priority)
 {
bool deliver;
+   unsigned long flags;
 
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
 
state->server = server;
state->priority = priority;
@@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
deliver = true;
}
 
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
 
return deliver;
 }
@@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
struct kvmppc_ics *ics;
struct ics_irq_state *state;
u16 src;
+   unsigned long flags;
 
if (!xics)
return -ENODEV;
@@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
return -EINVAL;
state = &ics->irq_state[src];
 
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
*server = state->server;
*priority = state->priority;
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
 
return 0;
 }
@@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
struct kvmppc_ics *ics;
u32 reject;
u16 src;
+   unsigned long flags;
 
/*
 * This is used both for initial delivery of an interrupt and
@@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
state = &ics->irq_state[src];
 
/* Get a lock on the ICS */
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
 
/* Get our server */
if (!icp || state->server != icp->server_num) {
@@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 *
 * Note that if successful, the new delivery might have itself
 * rejected an interrupt th

[PULL 04/21] KVM: PPC: Book3S HV: Remove RMA-related variables from code

2015-04-21 Thread Alexander Graf
From: "Aneesh Kumar K.V" 

We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code.  Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++--
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..015773f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -228,9 +228,8 @@ struct kvm_arch {
int tlbie_lock;
unsigned long lpcr;
unsigned long rmor;
-   struct kvm_rma_info *rma;
unsigned long vrma_slb_v;
-   int rma_setup_done;
+   int hpte_setup_done;
u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3..dbf1271 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
long order;
 
mutex_lock(&kvm->lock);
-   if (kvm->arch.rma_setup_done) {
-   kvm->arch.rma_setup_done = 0;
-   /* order rma_setup_done vs. vcpus_running */
+   if (kvm->arch.hpte_setup_done) {
+   kvm->arch.hpte_setup_done = 0;
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(&kvm->arch.vcpus_running)) {
-   kvm->arch.rma_setup_done = 1;
+   kvm->arch.hpte_setup_done = 1;
goto out;
}
}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
unsigned long tmp[2];
ssize_t nb;
long int err, ret;
-   int rma_setup;
+   int hpte_setup;
 
if (!access_ok(VERIFY_READ, buf, count))
return -EFAULT;
 
/* lock out vcpus from running while we're doing this */
mutex_lock(&kvm->lock);
-   rma_setup = kvm->arch.rma_setup_done;
-   if (rma_setup) {
-   kvm->arch.rma_setup_done = 0;   /* temporarily */
-   /* order rma_setup_done vs. vcpus_running */
+   hpte_setup = kvm->arch.hpte_setup_done;
+   if (hpte_setup) {
+   kvm->arch.hpte_setup_done = 0;  /* temporarily */
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(&kvm->arch.vcpus_running)) {
-   kvm->arch.rma_setup_done = 1;
+   kvm->arch.hpte_setup_done = 1;
mutex_unlock(&kvm->lock);
return -EBUSY;
}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
   "r=%lx\n", ret, i, v, r);
goto out;
}
-   if (!rma_setup && is_vrma_hpte(v)) {
+   if (!hpte_setup && is_vrma_hpte(v)) {
unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
(VRMA_VSID << SLB_VSID_SHIFT_1T);
lpcr = senc << (LPCR_VRMASD_SH - 4);
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-   rma_setup = 1;
+   hpte_setup = 1;
}
++i;
hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
}
 
  out:
-   /* Order HPTE updates vs. rma_setup_done */
+   /* Order HPTE updates vs. hpte_setup_done */
smp_wmb();
-   kvm->arch.rma_setup_done = rma_setup;
+   kvm->arch.hpte_setup_done = hpte_setup;
mutex_unlock(&kvm->lock);
 
if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b9c11a3..dde14fd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2044,11 +2044,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
}
 
atomic_inc(&vcpu->kvm->

[PULL 02/21] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Alexander Graf
From: David Gibson 

On POWER, storage caching is usually configured via the MMU - attributes
such as cache-inhibited are stored in the TLB and the hashed page table.

This makes correctly performing cache inhibited IO accesses awkward when
the MMU is turned off (real mode).  Some CPU models provide special
registers to control the cache attributes of real mode load and stores but
this is not at all consistent.  This is a problem in particular for SLOF,
the firmware used on KVM guests, which runs entirely in real mode, but
which needs to do IO to load the kernel.

To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
a logical address (aka guest physical address).  SLOF uses these for IO.

However, because these are implemented within qemu, not the host kernel,
these bypass any IO devices emulated within KVM itself.  The simplest way
to see this problem is to attempt to boot a KVM guest from a virtio-blk
device with iothread / dataplane enabled.  The iothread code relies on an
in kernel implementation of the virtio queue notification, which is not
triggered by the IO hcalls, and so the guest will stall in SLOF unable to
load the guest OS.

This patch addresses this by providing in-kernel implementations of the
2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
address not handled by the KVM IO bus will cause a VM exit, hitting the
qemu implementation as before.

Note that a userspace change is also required, in order to enable these
new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.

Signed-off-by: David Gibson 
[agraf: fix compilation]
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |  3 ++
 arch/powerpc/kvm/book3s.c | 76 +++
 arch/powerpc/kvm/book3s_hv.c  | 12 ++
 arch/powerpc/kvm/book3s_pr_papr.c | 28 +
 4 files changed, 119 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 942c7b1..578e550 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -292,6 +292,9 @@ static inline bool kvmppc_supports_magic_page(struct 
kvm_vcpu *vcpu)
return !is_kvmppc_hv_enabled(vcpu->kvm);
 }
 
+extern int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu);
+extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
+
 /* Magic register values loaded into r3 and r4 before the 'sc' assembly
  * instruction for the OSI hypercalls */
 #define OSI_SC_MAGIC_R30x113724FA
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index cfbcdc6..453a8a4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 #endif
 }
 
+int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
+{
+   unsigned long size = kvmppc_get_gpr(vcpu, 4);
+   unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+   u64 buf;
+   int ret;
+
+   if (!is_power_of_2(size) || (size > sizeof(buf)))
+   return H_TOO_HARD;
+
+   ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf);
+   if (ret != 0)
+   return H_TOO_HARD;
+
+   switch (size) {
+   case 1:
+   kvmppc_set_gpr(vcpu, 4, *(u8 *)&buf);
+   break;
+
+   case 2:
+   kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)&buf));
+   break;
+
+   case 4:
+   kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)&buf));
+   break;
+
+   case 8:
+   kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)&buf));
+   break;
+
+   default:
+   BUG();
+   }
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);
+
+int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+{
+   unsigned long size = kvmppc_get_gpr(vcpu, 4);
+   unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+   unsigned long val = kvmppc_get_gpr(vcpu, 6);
+   u64 buf;
+   int ret;
+
+   switch (size) {
+   case 1:
+   *(u8 *)&buf = val;
+   break;
+
+   case 2:
+   *(__be16 *)&buf = cpu_to_be16(val);
+   break;
+
+   case 4:
+   *(__be32 *)&buf = cpu_to_be32(val);
+   break;
+
+   case 8:
+   *(__be64 *)&buf = cpu_to_be64(val);
+   break;
+
+   default:
+   return H_TOO_HARD;
+   }
+
+   ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, &buf);
+   if (ret != 0)
+   return H_TOO_HARD;
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store);
+
 int kvmppc_core_check_processor_compat(void)
 {
/*
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756.

[PULL 14/21] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c| 92 +
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3eecd88..83c4425 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_RUNNING  2
-#define VCORE_EXITING  3
+#define VCORE_PREEMPT  2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b38c10e..fb4f166 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1882,15 +1882,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+   u64 now;
+   long ret;
+   struct kvm_vcpu *vcpu, *vnext;
+
+   now = get_tb();
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   /* cancel pending dec exception if dec is positive */
+   if (now < vcpu->arch.dec_expires &&
+   kvmppc_core_pending_dec(vcpu))
+   kvmppc_core_dequeue_dec(vcpu);
+
+   trace_kvm_guest_exit(vcpu);
+
+   ret = RESUME_GUEST;
+   if (vcpu->arch.trap)
+   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+   vcpu->arch.run_task);
+
+   vcpu->arch.ret = ret;
+   vcpu->arch.trap = 0;
+
+   if (vcpu->arch.ceded) {
+   if (!is_kvmppc_resume_guest(ret))
+   kvmppc_end_cede(vcpu);
+   else
+   kvmppc_set_timer(vcpu);
+   }
+   if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu, *vnext;
-   long ret;
-   u64 now;
+   struct kvm_vcpu *vcpu;
int i;
int srcu_idx;
 
@@ -1922,8 +1957,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if ((threads_per_core > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) 
{
vcpu->arch.ret = -EBUSY;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
goto out;
}
 
@@ -1979,44 +2017,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   cond_resched();
 
spin_lock(&vc->lock);
-   now = get_tb();
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   /* cancel pending dec exception if dec is positive */
-   if (now < vcpu->arch.dec_expires &&
-   kvmppc_core_pending_dec(vcpu))
-   kvmppc_core_dequeue_dec(vcpu);
-
-   trace_kvm_guest_exit(vcpu);
-
-   ret = RESUME_GUEST;
-   if (vcpu->arch.trap)
-   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-

[PULL 20/21] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI sending code that was
in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 63 ++
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 66 
 4 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..c42aa55 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +185,65 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__("stbcix %0,0,%1"
+   : : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* Poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc->pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active >>= 1, ++cpu)
+   if (active & 1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+   int ptid = local_paca->kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore->entry_exit_map */
+   me = 0x100 << ptid;
+   do {
+   ee = vc->entry_exit_map;
+   } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee >> 8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..00e45b6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -26,12 +26,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__("sync; stbcix %0,0,%1"
-   : : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -60,7 +54,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pending */
@@ -83,9 +76,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0, we adjust it */
cpu += vcpu->arch.ptid;
 
-   /* Not too hard, then poke the target */
-   xics_phys = paca[cpu].kvm_hstate.xics_phys;
-   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+   smp_mb();
+   kvmhv_rm_send_ipi(cpu);
 }
 
 static void icp_rm_clr_v

[PULL 13/21] KVM: PPC: Book3S HV: Minor cleanups

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2f339ff..3eecd88 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 3fea721..92ec3fc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -505,7 +505,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b06fe53..f8267e5 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -245,9 +245,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -493,11 +493,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1075,7 +1073,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1083,26 +1082,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1763,8 +1757,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2024,10 +2020,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-

[PULL 15/21] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 83c4425..1517faa 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 92ec3fc..8aa8246 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -563,7 +563,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fb4f166..7c1335d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1729,8 +1729,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = &paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca->kvm_hstate.hwthread_req = 1;
tpaca->kvm_hstate.kvm_vcpu = NULL;
+   tpaca->kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca->kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1773,35 +1775,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc->pcpu + vcpu->arch.ptid;
tpaca = &paca[cpu];
-   tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.kvm_vcore = vc;
tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
vcpu->cpu = vc->pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu->arch.ptid)
-   ++vc->n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc->nap_count < vc->n_woken) {
-   if (++i >= 100) {
-   pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-  vc->nap_count, vc->n_woken);
-   break;
+   for (loops = 0; loops < 100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1942,8 +1952,6 @@ static void kvmppc_run

[PULL 00/21] ppc patch queue 2015-04-21 for 4.1

2015-04-21 Thread Alexander Graf
Hi Paolo / Marcelo,

This is my current patch queue for ppc.  Please pull.

Alex


The following changes since commit b79013b2449c23f1f505bdf39c5a6c330338b244:

  Merge tag 'staging-4.1-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging (2015-04-13 
17:37:33 -0700)

are available in the git repository at:


  git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-queue

for you to fetch changes up to 66feed61cdf6ee65fd551d3460b1efba6bee55b8:

  KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 (2015-04-21 
15:21:34 +0200)


Patch queue for ppc - 2015-04-21

This is the latest queue for KVM on PowerPC changes. Highlights this
time around:

  - Book3S HV: Debugging aids
  - Book3S HV: Minor performance improvements
  - Book3S HV: Cleanups


Aneesh Kumar K.V (2):
  KVM: PPC: Book3S HV: Remove RMA-related variables from code
  KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte

David Gibson (1):
  kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

Michael Ellerman (1):
  KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.

Paul Mackerras (12):
  KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
  KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
  KVM: PPC: Book3S HV: Minor cleanups
  KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
  KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
  KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
  KVM: PPC: Book3S HV: Use decrementer to wake napping threads
  KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
  KVM: PPC: Book3S HV: Streamline guest entry and exit
  KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
  KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

Suresh E. Warrier (2):
  powerpc: Export __spin_yield
  KVM: PPC: Book3S HV: Add guest->host real mode completion counters

Suresh Warrier (3):
  KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
  KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
  KVM: PPC: Book3S HV: Add ICP real mode counters

 Documentation/virtual/kvm/api.txt|  17 +
 arch/powerpc/include/asm/archrandom.h|  11 +-
 arch/powerpc/include/asm/kvm_book3s.h|   3 +
 arch/powerpc/include/asm/kvm_book3s_64.h |  18 +
 arch/powerpc/include/asm/kvm_host.h  |  47 ++-
 arch/powerpc/include/asm/kvm_ppc.h   |   2 +
 arch/powerpc/include/asm/time.h  |   3 +
 arch/powerpc/kernel/asm-offsets.c|  20 +-
 arch/powerpc/kernel/time.c   |   6 +
 arch/powerpc/kvm/Kconfig |  14 +
 arch/powerpc/kvm/book3s.c|  76 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 189 +--
 arch/powerpc/kvm/book3s_hv.c | 435 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c | 100 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 238 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 559 +++
 arch/powerpc/kvm/book3s_pr_papr.c|  28 ++
 arch/powerpc/kvm/book3s_xics.c   | 105 --
 arch/powerpc/kvm/book3s_xics.h   |  13 +-
 arch/powerpc/kvm/powerpc.c   |   3 +
 arch/powerpc/lib/locks.c |   1 +
 arch/powerpc/platforms/powernv/rng.c |  29 ++
 include/uapi/linux/kvm.h |   1 +
 virt/kvm/kvm_main.c  |   1 +
 25 files changed, 1580 insertions(+), 364 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 16/21] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6716db3..12d7e4c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2129,10 +2130,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
bl  kvmhv_accumulate_time
 #endif
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2144,7 +2148,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 19/21] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples)
  rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples)
  rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples)

and this after the change:

 rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples)
  rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples)
  rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples)

showing a substantial reduction in the time spent per guest entry in
the real-mode guest entry code, and smaller reductions in the real
mode guest exit and interrupt handling times.  (The test was to start
the guest and boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 212 +++-
 1 file changed, 126 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 245f5c9..3f6fd78 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -244,7 +257,15 @@ kvm_novcpu_wakeup:
b   kvmppc_got_guest
 
 kvm_novcpu_exit:
-   b   hdec_soon
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+#endif
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -422,7 +443,7 @@ kvmppc_hv_entry:
/* Primary thread switches to guest partition. */
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -493,26 +514,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -837,6 +841,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */

[PULL 17/21] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 12d7e4c..16719af 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1493,10 +1502,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2124,6 +2133,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
@@ -2181,6 +2211,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 18/21] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++--
 5 files changed, 44 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1517faa..d67a838 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8aa8246..0d07efb 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -562,7 +562,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7c1335d..ea1600f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc->entry_exit_count = 0;
+   vc->entry_exit_map = 0;
vc->preempt_tb = TB_NIL;
vc->in_guest = 0;
vc->napping_threads = 0;
@@ -2119,8 +2119,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc->vcore_state == VCORE_RUNNING &&
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* => don't yield */
 
set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-   while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc->napping

[PULL 03/21] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.

2015-04-21 Thread Alexander Graf
From: Michael Ellerman 

Some PowerNV systems include a hardware random-number generator.
This HWRNG is present on POWER7+ and POWER8 chips and is capable of
generating one 64-bit random number every microsecond.  The random
numbers are produced by sampling a set of 64 unstable high-frequency
oscillators and are almost completely entropic.

PAPR defines an H_RANDOM hypercall which guests can use to obtain one
64-bit random sample from the HWRNG.  This adds a real-mode
implementation of the H_RANDOM hypercall.  This hypercall was
implemented in real mode because the latency of reading the HWRNG is
generally small compared to the latency of a guest exit and entry for
all the threads in the same virtual core.

Userspace can detect the presence of the HWRNG and the H_RANDOM
implementation by querying the KVM_CAP_PPC_HWRNG capability.  The
H_RANDOM hypercall implementation will only be invoked when the guest
does an H_RANDOM hypercall if userspace first enables the in-kernel
H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.

Signed-off-by: Michael Ellerman 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 Documentation/virtual/kvm/api.txt   |  17 +
 arch/powerpc/include/asm/archrandom.h   |  11 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |   2 +
 arch/powerpc/kvm/book3s_hv_builtin.c|  15 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 
 arch/powerpc/kvm/powerpc.c  |   3 +
 arch/powerpc/platforms/powernv/rng.c|  29 
 include/uapi/linux/kvm.h|   1 +
 8 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index bc9f6fe..9fa2bf8 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3573,3 +3573,20 @@ struct {
 @ar   - access register number
 
 KVM handlers should exit to userspace with rc = -EREMOTE.
+
+
+8. Other capabilities.
+--
+
+This section lists capabilities that give information about other
+features of the KVM implementation.
+
+8.1 KVM_CAP_PPC_HWRNG
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+H_RANDOM hypercall backed by a hardware random-number generator.
+If present, the kernel H_RANDOM handler can be enabled for guest use
+with the KVM_CAP_PPC_ENABLE_HCALL capability.
diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index bde5311..0cc6eed 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -30,8 +30,6 @@ static inline int arch_has_random(void)
return !!ppc_md.get_random_long;
 }
 
-int powernv_get_random_long(unsigned long *v);
-
 static inline int arch_get_random_seed_long(unsigned long *v)
 {
return 0;
@@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void)
 
 #endif /* CONFIG_ARCH_RANDOM */
 
+#ifdef CONFIG_PPC_POWERNV
+int powernv_hwrng_present(void);
+int powernv_get_random_long(unsigned long *v);
+int powernv_get_random_real_mode(unsigned long *v);
+#else
+static inline int powernv_hwrng_present(void) { return 0; }
+static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
+#endif
+
 #endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 46bf652..b8475da 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm)
return kvm->arch.kvm_ops == kvmppc_hv_ops;
 }
 
+extern int kvmppc_hwrng_present(void);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1f083ff..1954a1c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd)
return 0;
 }
 EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);
+
+int kvmppc_hwrng_present(void)
+{
+   return powernv_hwrng_present();
+}
+EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
+
+long kvmppc_h_random(struct kvm_vcpu *vcpu)
+{
+   if (powernv_get_random_real_mode(&vcpu->arch.gpr[4]))
+   return H_SUCCESS;
+
+   return H_HARDWARE;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6cbf163..0814ca1 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1839,6 +1839,121 @@ hcall_real_table:
.long   0   /* 0x12c */
.long   0 

[PULL 06/21] KVM: PPC: Book3S HV: Add guest->host real mode completion counters

2015-04-21 Thread Alexander Graf
From: "Suresh E. Warrier" 

Add counters to track number of times we switch from guest real mode
to host virtual mode during an interrupt-related hyper call because the
hypercall requires actions that cannot be completed in real mode. This
will help when making optimizations that reduce guest-host transitions.

It is safe to use an ordinary increment rather than an atomic operation
because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
only works on the ICP for the current VCPU.

The counters are displayed as part of IPC and ICP state provided by
/sys/debug/kernel/powerpc/kvm* for each VM.

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_xics.c | 31 +++
 arch/powerpc/kvm/book3s_xics.h |  6 ++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..60bdbac 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct 
kvm_vcpu *vcpu, u32 hcall)
XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n",
 hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt);
 
-   if (icp->rm_action & XICS_RM_KICK_VCPU)
+   if (icp->rm_action & XICS_RM_KICK_VCPU) {
+   icp->n_rm_kick_vcpu++;
kvmppc_fast_vcpu_kick(icp->rm_kick_target);
-   if (icp->rm_action & XICS_RM_CHECK_RESEND)
+   }
+   if (icp->rm_action & XICS_RM_CHECK_RESEND) {
+   icp->n_rm_check_resend++;
icp_check_resend(xics, icp->rm_resend_icp);
-   if (icp->rm_action & XICS_RM_REJECT)
+   }
+   if (icp->rm_action & XICS_RM_REJECT) {
+   icp->n_rm_reject++;
icp_deliver_irq(xics, icp, icp->rm_reject);
-   if (icp->rm_action & XICS_RM_NOTIFY_EOI)
+   }
+   if (icp->rm_action & XICS_RM_NOTIFY_EOI) {
+   icp->n_rm_notify_eoi++;
kvm_notify_acked_irq(vcpu->kvm, 0, icp->rm_eoied_irq);
+   }
 
icp->rm_action = 0;
 
@@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
struct kvm *kvm = xics->kvm;
struct kvm_vcpu *vcpu;
int icsid, i;
+   unsigned long t_rm_kick_vcpu, t_rm_check_resend;
+   unsigned long t_rm_reject, t_rm_notify_eoi;
 
if (!kvm)
return 0;
 
+   t_rm_kick_vcpu = 0;
+   t_rm_notify_eoi = 0;
+   t_rm_check_resend = 0;
+   t_rm_reject = 0;
+
seq_printf(m, "=\nICP state\n=\n");
 
kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
   icp->server_num, state.xisr,
   state.pending_pri, state.cppr, state.mfrr,
   state.out_ee, state.need_resend);
+   t_rm_kick_vcpu += icp->n_rm_kick_vcpu;
+   t_rm_notify_eoi += icp->n_rm_notify_eoi;
+   t_rm_check_resend += icp->n_rm_check_resend;
+   t_rm_reject += icp->n_rm_reject;
}
 
+   seq_puts(m, "ICP Guest Real Mode exit totals: ");
+   seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu 
notify_eoi=%lu\n",
+   t_rm_kick_vcpu, t_rm_check_resend,
+   t_rm_reject, t_rm_notify_eoi);
for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
struct kvmppc_ics *ics = xics->ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 73f0f27..de970ec 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -78,6 +78,12 @@ struct kvmppc_icp {
u32  rm_reject;
u32  rm_eoied_irq;
 
+   /* Counters for each reason we exited real mode */
+   unsigned long n_rm_kick_vcpu;
+   unsigned long n_rm_check_resend;
+   unsigned long n_rm_reject;
+   unsigned long n_rm_notify_eoi;
+
/* Debug stuff for real mode */
union kvmppc_icp_state rm_dbgstate;
struct kvm_vcpu *rm_dbgtgt;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 12/21] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d2068bb..2f339ff 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 64a02d4..b38c10e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1863,6 +1863,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu->arch.run_task))
+   vcpu->arch.ret = -EINTR;
+   else if (vcpu->arch.vpa.update_pending ||
+vcpu->arch.slb_shadow.update_pending ||
+vcpu->arch.dtl.update_pending)
+   vcpu->arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1872,46 +1891,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu->arch.run_task))
-   return;
-   if (vcpu->arch.vpa.update_pending ||
-   vcpu->arch.slb_shadow.update_pending ||
-   vcpu->arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc->vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc->n_woken = 0;
vc->nap_count = 0;
vc->entry_exit_count = 0;
vc->preempt_tb = TB_NIL;
-   vc->vcore_state = VCORE_STARTING;
vc->in_guest = 0;
vc->napping_threads = 0;
vc->conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(&vc->lock);
-   for (i = 0; i < need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   sp

[PULL 01/21] powerpc: Export __spin_yield

2015-04-21 Thread Alexander Graf
From: "Suresh E. Warrier" 

Export __spin_yield so that the arch_spin_unlock() function can
be invoked from a module. This will be required for modules where
we want to take a lock that is also is acquired in hypervisor
real mode. Because we want to avoid running any lockdep code
(which may not be safe in real mode), this lock needs to be
an arch_spinlock_t instead of a normal spinlock.

Signed-off-by: Suresh Warrier 
Acked-by: Paul Mackerras 
Acked-by: Michael Ellerman 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/lib/locks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 170a034..f7deebd 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -41,6 +41,7 @@ void __spin_yield(arch_spinlock_t *lock)
plpar_hcall_norets(H_CONFER,
get_hard_smp_processor_id(holder_cpu), yield_count);
 }
+EXPORT_SYMBOL_GPL(__spin_yield);
 
 /*
  * Waiting for a read lock or a write lock on a rwlock...
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 11/21] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |  21 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  13 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/Kconfig|  14 +++
 arch/powerpc/kvm/book3s_hv.c| 150 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 141 +-
 7 files changed, 346 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..d2068bb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -657,6 +665,19 @@ struct kvm_vcpu_arch {
 
u32 emul_inst;
 #endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
+#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
 #define VCPU_FPR(vcpu, i)  (vcpu)->arch.fp.fpr[i][TS_FPROFFSET]
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..3fea721 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -459,6 +459,19 @@ int main(void)
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
 #endif
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(VCPU_ACTIVITY_START, offset

[PULL 21/21] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry: 3387.6ns (228 - 86600, 1008969 samples)
  rm_exit: 4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr: 1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry: 3060.1ns (212 - 65138, 953873 samples)
  rm_exit: 4244.1ns (12 - 9693408, 954331 samples)
  rm_intr: 1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/asm-offsets.c   |  3 ++
 arch/powerpc/kvm/book3s_hv.c| 51 ++---
 arch/powerpc/kvm/book3s_hv_builtin.c| 16 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 --
 4 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0d07efb..0034b6b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
@@ -759,5 +760,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ea1600f..48d3c5d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,9 +85,35 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if (cpu_first_thread_sibling(cpu) ==
+   cpu_first_thread_sibling(smp_processor_id())) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu_thread_in_core(cpu);
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+   if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu->cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +123,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu->arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1781,10 +1800,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
if (cpu != smp_processor_id())
-   xics_wake_cpu(cpu);
-#endif
+   kvmppc_ipi_thread(cp

[PULL 10/21] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-04-21 Thread Alexander Graf
From: Paul Mackerras 

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode->i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p->kvm = kvm;
+   mutex_init(&p->mutex);
+   file->private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file->private_data;
+
+   kvm_put_kvm(p->kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file->private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(&p->mutex);
+   if (ret)
+   return ret;
+
+   if (p->chars_left) {
+   n = p->chars_left;
+   if (n > len)
+   n = len;
+   r = copy_to_user(buf, p->buf + p->buf_index, n);
+   n -= r;
+   p->chars_left -= n;
+   p->buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p->kvm;
+   i = p->hpt_index;
+   hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_t

[PULL 05/21] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte

2015-04-21 Thread Alexander Graf
From: "Aneesh Kumar K.V" 

This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e20..0789a0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v &= ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v &= ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbf1271..6c6825a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
gr = kvm->arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile("lwsync" : : : "memory");
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte->eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev->guest_rpte;
-   asm volatile("lwsync" : : : "memory");
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile("ptesync" : : : "memory");
preempt_enable();
if (page && hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v &= ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r &= ~HPTE_GR_MODIFIED;
revp->guest_rpte = r;
}
-  

[PULL 08/21] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode

2015-04-21 Thread Alexander Graf
From: Suresh Warrier 

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 ---
 1 file changed, 211 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 7c22997..73bbe92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -23,12 +23,39 @@
 
 #define DEBUG_PASSUP
 
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq);
+
 static inline void rm_writeb(unsigned long paddr, u8 val)
 {
__asm__ __volatile__("sync; stbcix %0,0,%1"
: : "r" (val), "r" (paddr) : "memory");
 }
 
+/* -- ICS routines -- */
+static void ics_rm_check_resend(struct kvmppc_xics *xics,
+   struct kvmppc_ics *ics, struct kvmppc_icp *icp)
+{
+   int i;
+
+   arch_spin_lock(&ics->lock);
+
+   for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+   struct ics_irq_state *state = &ics->irq_state[i];
+
+   if (!state->resend)
+   continue;
+
+   arch_spin_unlock(&ics->lock);
+   icp_rm_deliver_irq(xics, icp, state->number);
+   arch_spin_lock(&ics->lock);
+   }
+
+   arch_spin_unlock(&ics->lock);
+}
+
+/* -- ICP routines -- */
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
@@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics,
return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS;
 }
 
+static void icp_rm_check_resend(struct kvmppc_xics *xics,
+struct kvmppc_icp *icp)
+{
+   u32 icsid;
+
+   /* Order this load with the test for need_resend in the caller */
+   smp_rmb();
+   for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) {
+   struct kvmppc_ics *ics = xics->ics[icsid];
+
+   if (!test_and_clear_bit(icsid, icp->resend_map))
+   continue;
+   if (!ics)
+   continue;
+   ics_rm_check_resend(xics, ics, icp);
+   }
+}
+
+static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+  u32 *reject)
+{
+   union kvmppc_icp_state old_state, new_state;
+   bool success;
+
+   do {
+   old_state = new_state = READ_ONCE(icp->state);
+
+   *reject = 0;
+
+   /* See if we can deliver */
+   success = new_state.cppr > priority &&
+   new_state.mfrr > priority &&
+   new_state.pending_pri > priority;
+
+   /*
+* If we can, check for a rejection and perform the
+* delivery
+*/
+   if (success) {
+   *reject = new_state.xisr;
+   new_state.xisr = irq;
+   new_state.pending_pri = priority;
+   } else {
+   /*
+* If we failed to deliver we set need_resend
+* so a subsequent CPPR state change causes us
+* to try a new delivery.
+*/
+   new_state.need_resend = true;
+   }
+
+   } while (!icp_rm_try_update(icp, old_state, new_state));
+
+   return success;
+}
+
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq)
+{
+   struct ics_irq_state *state;
+   struct kvmppc_ics *ics;
+   u32 reject;
+   u16 src;
+
+   /*
+* This is used both for initial delivery of an interrupt and
+* for subsequent rejection.
+*
+* Rejection can be racy vs. resends. We have evaluated the
+

Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Alexander Graf

On 04/21/2015 02:41 AM, David Gibson wrote:

On POWER, storage caching is usually configured via the MMU - attributes
such as cache-inhibited are stored in the TLB and the hashed page table.

This makes correctly performing cache inhibited IO accesses awkward when
the MMU is turned off (real mode).  Some CPU models provide special
registers to control the cache attributes of real mode load and stores but
this is not at all consistent.  This is a problem in particular for SLOF,
the firmware used on KVM guests, which runs entirely in real mode, but
which needs to do IO to load the kernel.

To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
a logical address (aka guest physical address).  SLOF uses these for IO.

However, because these are implemented within qemu, not the host kernel,
these bypass any IO devices emulated within KVM itself.  The simplest way
to see this problem is to attempt to boot a KVM guest from a virtio-blk
device with iothread / dataplane enabled.  The iothread code relies on an
in kernel implementation of the virtio queue notification, which is not
triggered by the IO hcalls, and so the guest will stall in SLOF unable to
load the guest OS.

This patch addresses this by providing in-kernel implementations of the
2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
address not handled by the KVM IO bus will cause a VM exit, hitting the
qemu implementation as before.

Note that a userspace change is also required, in order to enable these
new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.

Signed-off-by: David Gibson 
---
  arch/powerpc/include/asm/kvm_book3s.h |  3 ++
  arch/powerpc/kvm/book3s.c | 76 +++
  arch/powerpc/kvm/book3s_hv.c  | 12 ++
  arch/powerpc/kvm/book3s_pr_papr.c | 28 +
  4 files changed, 119 insertions(+)

Changes in v4:
  * Rebase onto 4.0+, correct for changed signature of kvm_io_bus_{read,write}

Alex, I saw from some build system notifications that you seemed to
hit some troubles compiling the last version of this patch. This
should fix it - hope it's not too late to get into 4.1.


Oh, I already fixed it up in my tree, no worries.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-15 Thread Alexander Graf


On 09.04.15 10:49, Paolo Bonzini wrote:
> 
> 
> On 09/04/2015 00:57, Alexander Graf wrote:
>>>
>>> The last patch in this series needs a definition of PPC_MSGCLR that is
>>> added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
>>> handling", which has now gone upstream into Linus' tree as commit
>>> 755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
>>> to handle that?  You could pull in the master branch of the kvm tree,
>>> which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
>>> let the subsequent merge fix it up.
>>
>> I've just cherry-picked it for now since it still lives in my queue, so
>> it will get thrown out automatically once I rebase on next if it's
>> included in there.
>>
>> Paolo / Marcelo, could you please try to somehow get the commit above
>> into the next branch somehow? I guess the easiest would be to merge
>> linus/master into kvm/next.
>>
>> Thanks, applied all to kvm-ppc-queue.
> 
> I plan to send the x86/MIPS/s390/ARM merge very early to Linus, maybe
> even tomorrow.  So you can just rebase on top of 4.0-rc6 and send your
> pull request relative to Linus's tree instead of kvm/next.
> 
> Does that work for you?

Phew, that really complicates things on my side. I usually do

  kvm-ppc-queue -> kvm-ppc-next -> kvm/next

which means that my queue already contains your next patches. I could of
course to a rebase --onto and remove anything that is in the kvm tree,
but then we'd end up conflicting on documentation changes.

Since you already did send out the first pull request, just let me know
when you pulled linus' tree back into kvm/next (or kvm/master) so that I
can fast-forward merge this in my kvm-ppc-next branch and then rebase my
queue on top, merge it into the next branch and send you a pull request ;)


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-15 Thread Alexander Graf


On 14.04.15 13:56, Paul Mackerras wrote:
> On Thu, Apr 09, 2015 at 12:57:58AM +0200, Alexander Graf wrote:
>> On 03/28/2015 04:21 AM, Paul Mackerras wrote:
>>> This is the rest of my current patch queue for HV KVM on PPC.  This
>>> series is based on Alex Graf's kvm-ppc-queue branch.  The only change
>> >from the previous version of this series is that patch 2 has been
>>> updated to take account of the timebase offset.
>>>
>>> The last patch in this series needs a definition of PPC_MSGCLR that is
>>> added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
>>> handling", which has now gone upstream into Linus' tree as commit
>>> 755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
>>> to handle that?  You could pull in the master branch of the kvm tree,
>>> which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
>>> let the subsequent merge fix it up.
>>
>> I've just cherry-picked it for now since it still lives in my queue, so it
>> will get thrown out automatically once I rebase on next if it's included in
>> there.
>>
>> Paolo / Marcelo, could you please try to somehow get the commit above into
>> the next branch somehow? I guess the easiest would be to merge linus/master
>> into kvm/next.
>>
>> Thanks, applied all to kvm-ppc-queue.
> 
> Did you forget to push it out or something?  Your kvm-ppc-queue branch
> is still at 4.0-rc1 as far as I can see.

Oops, not sure how that happened. Does it show up correctly for you now?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-08 Thread Alexander Graf

On 03/28/2015 04:21 AM, Paul Mackerras wrote:

This is the rest of my current patch queue for HV KVM on PPC.  This
series is based on Alex Graf's kvm-ppc-queue branch.  The only change
from the previous version of this series is that patch 2 has been
updated to take account of the timebase offset.

The last patch in this series needs a definition of PPC_MSGCLR that is
added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
handling", which has now gone upstream into Linus' tree as commit
755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
to handle that?  You could pull in the master branch of the kvm tree,
which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
let the subsequent merge fix it up.


I've just cherry-picked it for now since it still lives in my queue, so 
it will get thrown out automatically once I rebase on next if it's 
included in there.


Paolo / Marcelo, could you please try to somehow get the commit above 
into the next branch somehow? I guess the easiest would be to merge 
linus/master into kvm/next.


Thanks, applied all to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 3/3] KVM: PPC: Book3S HV: Fix instruction emulation

2015-03-25 Thread Alexander Graf
From: Paul Mackerras 

Commit 4a157d61b48c ("KVM: PPC: Book3S HV: Fix endianness of
instruction obtained from HEIR register") had the side effect that
we no longer reset vcpu->arch.last_inst to -1 on guest exit in
the cases where the instruction is not fetched from the guest.
This means that if instruction emulation turns out to be required
in those cases, the host will emulate the wrong instruction, since
vcpu->arch.last_inst will contain the last instruction that was
emulated.

This fixes it by making sure that vcpu->arch.last_inst is reset
to -1 in those cases.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bb94e6f..6cbf163 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
/* Save HEIR (HV emulation assist reg) in emul_inst
   if this is an HEI (HV emulation interrupt, e40) */
li  r3,KVM_INST_FETCH_FAILED
+   stw r3,VCPU_LAST_INST(r9)
cmpwi   r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
bne 11f
mfspr   r3,SPRN_HEIR
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 1/3] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()

2015-03-25 Thread Alexander Graf
From: Paul Mackerras 

Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
and inside that does mutex_lock(&kvm->lock).  It is not permitted to
take a mutex while holding a spinlock, because the mutex_lock might
call schedule().  In addition, this causes lockdep to warn about a
lock ordering issue:

==
[ INFO: possible circular locking dependency detected ]
3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
---
qemu-system-ppc/8179 is trying to acquire lock:
 (&kvm->lock){+.+.+.}, at: [] .kvmppc_set_lpcr+0xf4/0x1c0 
[kvm_hv]

but task is already holding lock:
 (&(&vcore->lock)->rlock){+.+...}, at: [] 
.kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&vcore->lock)->rlock){+.+...}:
   [] .mutex_lock_nested+0x80/0x570
   [] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
   [] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
   [] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
   [] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
   [] .do_vfs_ioctl+0x444/0x770
   [] .SyS_ioctl+0xc4/0xe0
   [] syscall_exit+0x0/0x98

-> #0 (&kvm->lock){+.+.+.}:
   [] .lock_acquire+0xcc/0x1a0
   [] .mutex_lock_nested+0x80/0x570
   [] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
   [] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
   [] .kvmppc_set_one_reg+0x44/0x330 [kvm]
   [] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
   [] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
   [] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
   [] .do_vfs_ioctl+0x444/0x770
   [] .SyS_ioctl+0xc4/0xe0
   [] syscall_exit+0x0/0x98

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(&(&vcore->lock)->rlock);
   lock(&kvm->lock);
   lock(&(&vcore->lock)->rlock);
  lock(&kvm->lock);

 *** DEADLOCK ***

2 locks held by qemu-system-ppc/8179:
 #0:  (&vcpu->mutex){+.+.+.}, at: [] .vcpu_load+0x28/0x90 
[kvm]
 #1:  (&(&vcore->lock)->rlock){+.+...}, at: [] 
.kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

stack backtrace:
CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 
3.18.0-kvm-04645-gdfea862-dirty #131
Call Trace:
[c01a66c0f310] [c0b486ac] .dump_stack+0x88/0xb4 (unreliable)
[c01a66c0f390] [c00f8bec] .print_circular_bug+0x27c/0x3d0
[c01a66c0f440] [c00fe9e8] .__lock_acquire+0x2028/0x2190
[c01a66c0f5d0] [c00ff28c] .lock_acquire+0xcc/0x1a0
[c01a66c0f6a0] [c0b3c120] .mutex_lock_nested+0x80/0x570
[c01a66c0f7c0] [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
[c01a66c0f860] [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 
[kvm_hv]
[c01a66c0f8d0] [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
[c01a66c0f960] [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 
[kvm]
[c01a66c0f9f0] [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
[c01a66c0faf0] [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
[c01a66c0fcb0] [c026cbb4] .do_vfs_ioctl+0x444/0x770
[c01a66c0fd90] [c026cfa4] .SyS_ioctl+0xc4/0xe0
[c01a66c0fe30] [c0009264] syscall_exit+0x0/0x98

This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
the spin-locked region.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a..b273193 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
kvm_vcpu *vcpu,
 static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
bool preserve_top32)
 {
+   struct kvm *kvm = vcpu->kvm;
struct kvmppc_vcore *vc = vcpu->arch.vcore;
u64 mask;
 
+   mutex_lock(&kvm->lock);
spin_lock(&vc->lock);
/*
 * If ILE (interrupt little-endian) has changed, update the
 * MSR_LE bit in the intr_msr for each vcpu in this vcore.
 */
if ((new_lpcr & LPCR_ILE) != (vc->lpcr & LPCR_ILE)) {
-   struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *vcpu;
int i;
 
-   mutex_lock(&kvm->lock);
kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu->arch.vcore != vc)
continue;
@@ -964,7 +964,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 
new_lpcr,
else
vcpu->arch.in

[PULL 2/3] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count

2015-03-25 Thread Alexander Graf
From: Paul Mackerras 

The VPA (virtual processor area) is defined by PAPR and is therefore
big-endian, so we need a be32_to_cpu when reading it in
kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
little-endian host, causing SMP guests to waste time spinning on
spinlocks.

Signed-off-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b273193..de74756 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
spin_lock(&vcpu->arch.vpa_update_lock);
lppaca = (struct lppaca *)vcpu->arch.vpa.pinned_addr;
if (lppaca)
-   yield_count = lppaca->yield_count;
+   yield_count = be32_to_cpu(lppaca->yield_count);
spin_unlock(&vcpu->arch.vpa_update_lock);
return yield_count;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 0/3] 4.0 patch queue 2015-03-25

2015-03-25 Thread Alexander Graf
Hi Paolo,

This is my current patch queue for 4.0.  Please pull.

Alex


The following changes since commit f710a12d73dfa1c3a5d2417f2482b970f03bb850:

  Merge tag 'kvm-arm-fixes-4.0-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm (2015-03-16 
20:08:56 -0300)

are available in the git repository at:


  git://github.com/agraf/linux-2.6.git tags/signed-for-4.0

for you to fetch changes up to 2bf27601c7b50b6ced72f27304109dc52eb52919:

  KVM: PPC: Book3S HV: Fix instruction emulation (2015-03-20 11:42:33 +0100)


Patch queue for 4.0 - 2015-03-25

A few bug fixes for Book3S HV KVM:

  - Fix spinlock ordering
  - Fix idle guests on LE hosts
  - Fix instruction emulation


Paul Mackerras (3):
  KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in 
kvmppc_set_lpcr()
  KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
  KVM: PPC: Book3S HV: Fix instruction emulation

 arch/powerpc/kvm/book3s_hv.c| 8 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-ppc:kvm-ppc-queue 7/9] ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined!

2015-03-23 Thread Alexander Graf


On 23.03.15 04:03, Michael Ellerman wrote:
> On Mon, 2015-03-23 at 14:00 +1100, Paul Mackerras wrote:
>> On Fri, Mar 20, 2015 at 08:07:53PM +0800, kbuild test robot wrote:
>>> tree:   git://github.com/agraf/linux-2.6.git kvm-ppc-queue
>>> head:   9b1daf3cfba1801768aa41b1b6ad0b653844241f
>>> commit: aba777f5ce0accb4c6a277e671de0330752954e8 [7/9] KVM: PPC: Book3S HV: 
>>> Convert ICS mutex lock to spin lock
>>> config: powerpc-defconfig (attached as .config)
>>> reproduce:
>>>   wget 
>>> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>>>  -O ~/bin/make.cross
>>>   chmod +x ~/bin/make.cross
>>>   git checkout aba777f5ce0accb4c6a277e671de0330752954e8
>>>   # save the attached .config to linux build tree
>>>   make.cross ARCH=powerpc 
>>>
>>> All error/warnings:
>>>
> ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined!
>>
>> Yes, this is the patch that depends on the "powerpc: Export
>> __spin_yield" patch that Suresh posted to linuxppc-...@ozlabs.org and
>> I acked.
>>
>> I think the best thing at this stage is probably for Alex to take that
>> patch through his tree, assuming Michael is OK with that.
> 
> Fine by me.
> 
> Acked-by: Michael Ellerman 

Awesome, thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-23 Thread Alexander Graf


On 23.03.15 08:50, Bharata B Rao wrote:
> On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf  wrote:
>>
>>
>> On 20.03.15 16:51, Bharata B Rao wrote:
>>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 20.03.15 12:26, Paul Mackerras wrote:
>>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>>>> From: Bharata B Rao 
>>>>>>>
>>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from 
>>>>>>> userspace(QEMU)
>>>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu 
>>>>>>> unplug
>>>>>>> and reuse it later during next hotplug.
>>>>>>>
>>>>>>> More details can be found here:
>>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>>>
>>>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>>>> initialize ICP if the vCPU is found to be already associated with an 
>>>>>>> ICP.
>>>>>>>
>>>>>>> Signed-off-by: Bharata B Rao 
>>>>>>> Signed-off-by: Paul Mackerras 
>>>>>>
>>>>>> This probably makes some sense, but please make sure that user space has
>>>>>> some way to figure out whether hotplug works at all.
>>>>>
>>>>> Bharata is working on the qemu side of all this, so I assume he has
>>>>> that covered.
>>>>
>>>> Well, so far the kernel doesn't expose anything he can query, so I
>>>> suppose he just blindly assumes that older host kernels will randomly
>>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>>>> can check on.
>>>
>>> I see that you have already taken this into your tree. I have an updated
>>> patch to expose a CAP. If the below patch looks ok, then let me know how
>>> you would prefer to take this patch in.
>>>
>>> Regards,
>>> Bharata.
>>>
>>> KVM: PPC: BOOK3S: Allow reuse of vCPU object
>>>
>>> From: Bharata B Rao 
>>>
>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>> correctly, certain work arounds have to be employed to allow reuse of
>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>> and reuse it later during next hotplug.
>>>
>>> More details can be found here:
>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>
>>> In order to support this workaround with PowerPC KVM, don't create or
>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>> User space (QEMU) can reuse the vCPU after checking for the availability
>>> of KVM_CAP_SPAPR_REUSE_VCPU capability.
>>>
>>> Signed-off-by: Bharata B Rao 
>>> ---
>>>  arch/powerpc/kvm/book3s_xics.c |9 +++--
>>>  arch/powerpc/kvm/powerpc.c |   12 
>>>  include/uapi/linux/kvm.h   |1 +
>>>  3 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
>>> index a4a8d9f..ead3a35 100644
>>> --- a/arch/powerpc/kvm/book3s_xics.c
>>> +++ b/arch/powerpc/kvm/book3s_xics.c
>>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 
>>> struct kvm_vcpu *vcpu,
>>>   return -EPERM;
>>>   if (xics->kvm != vcpu->kvm)
>>>   return -EPERM;
>>> - if (vcpu->arch.irq_type)
>>> - return -EBUSY;
>>> +
>>> + /*
>>> +  * If irq_type is already set, don't reinialize but
>>> +  * return success allowing this vcpu to 

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-21 Thread Alexander Graf


On 20.03.15 16:51, Bharata B Rao wrote:
> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 12:26, Paul Mackerras wrote:
>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>> From: Bharata B Rao 
>>>>>
>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>> and reuse it later during next hotplug.
>>>>>
>>>>> More details can be found here:
>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>
>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>
>>>>> Signed-off-by: Bharata B Rao 
>>>>> Signed-off-by: Paul Mackerras 
>>>>
>>>> This probably makes some sense, but please make sure that user space has
>>>> some way to figure out whether hotplug works at all.
>>>
>>> Bharata is working on the qemu side of all this, so I assume he has
>>> that covered.
>>
>> Well, so far the kernel doesn't expose anything he can query, so I
>> suppose he just blindly assumes that older host kernels will randomly
>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>> can check on.
> 
> I see that you have already taken this into your tree. I have an updated
> patch to expose a CAP. If the below patch looks ok, then let me know how
> you would prefer to take this patch in.
> 
> Regards,
> Bharata.
> 
> KVM: PPC: BOOK3S: Allow reuse of vCPU object
> 
> From: Bharata B Rao 
> 
> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> correctly, certain work arounds have to be employed to allow reuse of
> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> and reuse it later during next hotplug.
> 
> More details can be found here:
> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> 
> In order to support this workaround with PowerPC KVM, don't create or
> initialize ICP if the vCPU is found to be already associated with an ICP.
> User space (QEMU) can reuse the vCPU after checking for the availability
> of KVM_CAP_SPAPR_REUSE_VCPU capability.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  arch/powerpc/kvm/book3s_xics.c |9 +++--
>  arch/powerpc/kvm/powerpc.c |   12 
>  include/uapi/linux/kvm.h   |1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
> index a4a8d9f..ead3a35 100644
> --- a/arch/powerpc/kvm/book3s_xics.c
> +++ b/arch/powerpc/kvm/book3s_xics.c
> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 
> struct kvm_vcpu *vcpu,
>   return -EPERM;
>   if (xics->kvm != vcpu->kvm)
>   return -EPERM;
> - if (vcpu->arch.irq_type)
> - return -EBUSY;
> +
> + /*
> +  * If irq_type is already set, don't reinialize but
> +  * return success allowing this vcpu to be reused.
> +  */
> + if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
> + return 0;
>  
>   r = kvmppc_xics_create_icp(vcpu, xcpu);
>   if (!r)
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 27c0fac..5b7007c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   r = 1;
>   break;
>  #endif
> + case KVM_CAP_SPAPR_REUSE_VCPU:
> + /*
> +  * Kernel currently doesn't support closing of vCPU fd from
> +  * user space (QEMU) correctly. Hence the option available
> +  * is to park the vCPU fd in user space whenever a guest
> +  * CPU is hot removed and reuse the 

Re: [PATCH v4 2/4] kvm/ppc/mpic: drop unused IRQ_testbit

2015-03-21 Thread Alexander Graf


On 21.03.15 07:56, Arseny Solokha wrote:
> Drop unused static procedure which doesn't have callers within its
> translation unit. It had been already removed independently in QEMU[1]
> from the OpenPIC implementation borrowed by the kernel.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html
> 
> v4: Fixed the comment regarding the origination of OpenPIC codebase
> and CC'ed KVM mailing lists, as suggested by Alexander Graf.
> 
> v3: In patch 4/4, do not remove fsl_mpic_primary_get_version() from
> arch/powerpc/sysdev/mpic.c because the patch by Jia Hongtao
> ("powerpc/85xx: workaround for chips with MSI hardware errata") makes
> use of it.
> 
> v2: Added a brief explanation to each patch description of why removed
> functions are unused, as suggested by Michael Ellerman.
> 
> Signed-off-by: Arseny Solokha 

Thanks, applied to kvm-ppc-queue (for 4.1).


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Alexander Graf


On 20.03.15 10:39, Paul Mackerras wrote:
> This is my current patch queue for HV KVM on PPC.  This series is
> based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
> plus a set of recent KVM changes which don't intersect with the
> changes in this series.  On top of that, in my testing I have some
> patches which are not KVM-related but are needed to boot and run a
> recent upstream kernel successfully:
> 
> tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
> tick/hotplug: Handover time related duties before cpu offline
> powerpc/powernv: Check image loaded or not before calling flash
> powerpc/powernv: Fixes for hypervisor doorbell handling
> powerpc/powernv: Fix return value from power7_nap() et al.
> powerpc: Export __spin_yield
> 
> These patches have been posted by their authors and are on their way
> upstream via various trees.  They are not included in this series.
> 
> The first three patches are bug fixes that should go into v4.0 if
> possible.  The remainder are intended for the 4.1 merge window.
> 
> The patch "powerpc: Export __spin_yield" is a prerequisite for patch
> 9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to
> spin lock").  It is on its way upstream through the linuxppc-dev
> mailing list.
> 
> The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is
> needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV:
> Use msgsnd for signalling threads".  It is also on its way upstream
> through the linuxppc-dev list.  I am expecting both of these
> prerequisite patches to go into 4.0.
> 
> Finally, the last patch in this series converts some of the assembly
> code in book3s_hv_rmhandlers.S into C.  I intend to continue this
> trend.

Thanks, applied patches 4-11 to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Alexander Graf


On 20.03.15 12:25, Paul Mackerras wrote:
> On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 10:39, Paul Mackerras wrote:
>>> This reads the timebase at various points in the real-mode guest
>>> entry/exit code and uses that to accumulate total, minimum and
>>> maximum time spent in those parts of the code.  Currently these
>>> times are accumulated per vcpu in 5 parts of the code:
>>>
>>> * rm_entry - time taken from the start of kvmppc_hv_entry() until
>>>   just before entering the guest.
>>> * rm_intr - time from when we take a hypervisor interrupt in the
>>>   guest until we either re-enter the guest or decide to exit to the
>>>   host.  This includes time spent handling hcalls in real mode.
>>> * rm_exit - time from when we decide to exit the guest until the
>>>   return from kvmppc_hv_entry().
>>> * guest - time spend in the guest
>>> * cede - time spent napping in real mode due to an H_CEDE hcall
>>>   while other threads in the same vcore are active.
>>>
>>> These times are exposed in debugfs in a directory per vcpu that
>>> contains a file called "timings".  This file contains one line for
>>> each of the 5 timings above, with the name followed by a colon and
>>> 4 numbers, which are the count (number of times the code has been
>>> executed), the total time, the minimum time, and the maximum time,
>>> all in nanoseconds.
>>>
>>> Signed-off-by: Paul Mackerras 
>>
>> Have you measure the additional overhead this brings?
> 
> I haven't - in fact I did this patch so I could measure the overhead
> or improvement from other changes I did, but it doesn't measure its
> own overhead, of course.  I guess I need a workload that does a
> defined number of guest entries and exits and measure how fast it runs
> with and without the patch (maybe something like H_SET_MODE in a
> loop).  I'll figure something out and post the results.  

Yeah, just measure the number of exits you can handle for a simple
hcall. If there is measurable overhead, it's probably a good idea to
move the statistics gathering into #ifdef paths for DEBUGFS or maybe
even a separate EXIT_TIMING config option as we have it for booke.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Alexander Graf


On 20.03.15 12:26, Paul Mackerras wrote:
> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 10:39, Paul Mackerras wrote:
>>> From: Bharata B Rao 
>>>
>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>> correctly, certain work arounds have to be employed to allow reuse of
>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>> and reuse it later during next hotplug.
>>>
>>> More details can be found here:
>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>
>>> In order to support this workaround with PowerPC KVM, don't create or
>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>
>>> Signed-off-by: Bharata B Rao 
>>> Signed-off-by: Paul Mackerras 
>>
>> This probably makes some sense, but please make sure that user space has
>> some way to figure out whether hotplug works at all.
> 
> Bharata is working on the qemu side of all this, so I assume he has
> that covered.

Well, so far the kernel doesn't expose anything he can query, so I
suppose he just blindly assumes that older host kernels will randomly
break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
can check on.

> 
>> Also Paul, for patches that you pick up from others, I'd prefer if they
>> send the patches to the ML themselves first and you pick them up from
>> there then. That way we give everyone the same treatment.
> 
> Fair enough.  In fact Bharata did post the patch but he sent it to
> linuxppc-...@ozlabs.org not the KVM lists.

Please make sure you only take patches into your queue that made it to
at least kvm@vger, preferably kvm-ppc@vger as well. If you see related
patches on other mailing lists, just ask the respective people to resend
with proper ML exposure.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-20 Thread Alexander Graf


On 20.03.15 10:39, Paul Mackerras wrote:
> This uses msgsnd where possible for signalling other threads within
> the same core on POWER8 systems, rather than IPIs through the XICS
> interrupt controller.  This includes waking secondary threads to run
> the guest, the interrupts generated by the virtual XICS, and the
> interrupts to bring the other threads out of the guest when exiting.
> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kernel/asm-offsets.c   |  4 +++
>  arch/powerpc/kvm/book3s_hv.c| 48 
> ++---
>  arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 
>  4 files changed, 83 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index fa7b57d..0ce2aa6 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #ifdef CONFIG_PPC64
>  #include 
>  #include 
> @@ -568,6 +569,7 @@ int main(void)
>   DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
>   DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
>   DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
> + DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
>   DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
>   DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
>   DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
> @@ -757,5 +759,7 @@ int main(void)
>   offsetof(struct paca_struct, subcore_sibling_mask));
>  #endif
>  
> + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> +
>   return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 03a8bb4..2c34bae 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -51,6 +51,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
> MAX_HCALL_OPCODE/4 + 1);
>  static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
>  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
>  
> +static bool kvmppc_ipi_thread(int cpu)
> +{
> + /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> + if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> + preempt_disable();
> + if ((cpu & ~7) == (smp_processor_id() & ~7)) {
> + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> + msg |= cpu & 7;
> + smp_mb();
> + __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> + preempt_enable();
> + return true;
> + }
> + preempt_enable();
> + }
> +
> +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
> + if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
> + xics_wake_cpu(cpu);
> + return true;
> + }
> +#endif
> +
> + return false;
> +}
> +
>  static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>  {
> - int me;
>   int cpu = vcpu->cpu;
>   wait_queue_head_t *wqp;
>  
> @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
> *vcpu)
>   ++vcpu->stat.halt_wakeup;
>   }
>  
> - me = get_cpu();
> + if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
> + return;
>  
>   /* CPU points to the first thread of the core */
> - if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
> -#ifdef CONFIG_PPC_ICP_NATIVE
> - int real_cpu = cpu + vcpu->arch.ptid;
> - if (paca[real_cpu].kvm_hstate.xics_phys)
> - xics_wake_cpu(real_cpu);
> - else
> -#endif
> - if (cpu_online(cpu))
> - smp_send_reschedule(cpu);
> - }
> - put_cpu();
> + if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
> + smp_send_reschedule(cpu);
>  }
>  
>  /*
> @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
>   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
>   smp_wmb();
>   tpaca->kvm_hstate.kvm_vcpu = vcpu;
> -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
>   if (cpu != smp_processor_id())
> - xics_wake_cpu(cpu);
> -#endif
> + kvmppc_ipi_thread(cpu);
>  }
>  
>  static void kvmppc_wait_for_nap(void)
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
> b/arch/powerpc/kvm/book3s_hv_rm_xics.c
> index 6dded8c..457a8b1 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "book3s_xics.h"
>  
> @@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vc

Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-20 Thread Alexander Graf


On 20.03.15 10:39, Paul Mackerras wrote:
> This creates a debugfs directory for each HV guest (assuming debugfs
> is enabled in the kernel config), and within that directory, a file
> by which the contents of the guest's HPT (hashed page table) can be
> read.  The directory is named vm, where  is the PID of the
> process that created the guest.  The file is named "htab".  This is
> intended to help in debugging problems in the host's management
> of guest memory.
> 
> The contents of the file consist of a series of lines like this:
> 
>   3f48 4000d032bf003505 000bd7ff1196 0003b5c71196
> 
> The first field is the index of the entry in the HPT, the second and
> third are the HPT entry, so the third entry contains the real page
> number that is mapped by the entry if the entry's valid bit is set.
> The fourth field is the guest's view of the second doubleword of the
> entry, so it contains the guest physical address.  (The format of the
> second through fourth fields are described in the Power ISA and also
> in arch/powerpc/include/asm/mmu-hash64.h.)
> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
>  arch/powerpc/include/asm/kvm_host.h  |   2 +
>  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 
> +++
>  arch/powerpc/kvm/book3s_hv.c |  12 +++
>  virt/kvm/kvm_main.c  |   1 +
>  5 files changed, 153 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 0789a0f..869c53f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -436,6 +436,8 @@ static inline struct kvm_memslots 
> *kvm_memslots_raw(struct kvm *kvm)
>   return rcu_dereference_raw_notrace(kvm->memslots);
>  }
>  
> +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
> +
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  
>  #endif /* __ASM_KVM_BOOK3S_64_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 015773f..f1d0bbc 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -238,6 +238,8 @@ struct kvm_arch {
>   atomic_t hpte_mod_interest;
>   cpumask_t need_tlb_flush;
>   int hpt_cma_alloc;
> + struct dentry *debugfs_dir;
> + struct dentry *htab_dentry;
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 6c6825a..d6fe308 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
> kvm_get_htab_fd *ghf)
>   return ret;
>  }
>  
> +struct debugfs_htab_state {
> + struct kvm  *kvm;
> + struct mutexmutex;
> + unsigned long   hpt_index;
> + int chars_left;
> + int buf_index;
> + charbuf[64];
> +};
> +
> +static int debugfs_htab_open(struct inode *inode, struct file *file)
> +{
> + struct kvm *kvm = inode->i_private;
> + struct debugfs_htab_state *p;
> +
> + p = kzalloc(sizeof(*p), GFP_KERNEL);
> + if (!p)
> + return -ENOMEM;
> +
> + kvm_get_kvm(kvm);
> + p->kvm = kvm;
> + mutex_init(&p->mutex);
> + file->private_data = p;
> +
> + return nonseekable_open(inode, file);
> +}
> +
> +static int debugfs_htab_release(struct inode *inode, struct file *file)
> +{
> + struct debugfs_htab_state *p = file->private_data;
> +
> + kvm_put_kvm(p->kvm);
> + kfree(p);
> + return 0;
> +}
> +
> +static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
> +  size_t len, loff_t *ppos)
> +{
> + struct debugfs_htab_state *p = file->private_data;
> + ssize_t ret, r;
> + unsigned long i, n;
> + unsigned long v, hr, gr;
> + struct kvm *kvm;
> + __be64 *hptp;
> +
> + ret = mutex_lock_interruptible(&p->mutex);
> + if (ret)
> + return ret;
> +
> + if (p->chars_left) {
> + n = p->chars_left;
> + if (n > len)
> + n = len;
> + r = copy_to_user(buf, p->buf + p->buf_index, n);
> + n -= r;
> + p->chars_left -= n;
> + p->buf_index += n;
> + buf += n;
> + len -= n;
> + ret = n;
> + if (r) {
> + if (!n)
> + ret = -EFAULT;
> + goto out;
> + }
> + }
> +
> + kvm = p->kvm;
> + i = p->hpt_index;
> + hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
> + for (; len != 0 && i < kvm->arch.

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Alexander Graf


On 20.03.15 10:39, Paul Mackerras wrote:
> This reads the timebase at various points in the real-mode guest
> entry/exit code and uses that to accumulate total, minimum and
> maximum time spent in those parts of the code.  Currently these
> times are accumulated per vcpu in 5 parts of the code:
> 
> * rm_entry - time taken from the start of kvmppc_hv_entry() until
>   just before entering the guest.
> * rm_intr - time from when we take a hypervisor interrupt in the
>   guest until we either re-enter the guest or decide to exit to the
>   host.  This includes time spent handling hcalls in real mode.
> * rm_exit - time from when we decide to exit the guest until the
>   return from kvmppc_hv_entry().
> * guest - time spend in the guest
> * cede - time spent napping in real mode due to an H_CEDE hcall
>   while other threads in the same vcore are active.
> 
> These times are exposed in debugfs in a directory per vcpu that
> contains a file called "timings".  This file contains one line for
> each of the 5 timings above, with the name followed by a colon and
> 4 numbers, which are the count (number of times the code has been
> executed), the total time, the minimum time, and the maximum time,
> all in nanoseconds.
> 
> Signed-off-by: Paul Mackerras 

Have you measure the additional overhead this brings?

> ---
>  arch/powerpc/include/asm/kvm_host.h |  19 +
>  arch/powerpc/include/asm/time.h |   3 +
>  arch/powerpc/kernel/asm-offsets.c   |  11 +++
>  arch/powerpc/kernel/time.c  |   6 ++
>  arch/powerpc/kvm/book3s_hv.c| 135 
> 
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 -
>  6 files changed, 276 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index f1d0bbc..286c0ce 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -369,6 +369,14 @@ struct kvmppc_slb {
>   u8 base_page_size;  /* MMU_PAGE_xxx */
>  };
>  
> +/* Struct used to accumulate timing information in HV real mode code */
> +struct kvmhv_tb_accumulator {
> + u64 seqcount;   /* used to synchronize access, also count * 2 */
> + u64 tb_total;   /* total time in timebase ticks */
> + u64 tb_min; /* min time */
> + u64 tb_max; /* max time */
> +};
> +
>  # ifdef CONFIG_PPC_FSL_BOOK3E
>  #define KVMPPC_BOOKE_IAC_NUM 2
>  #define KVMPPC_BOOKE_DAC_NUM 2
> @@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
>   u64 busy_preempt;
>  
>   u32 emul_inst;
> +
> + struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
> + u64 cur_tb_start;   /* when it started */
> + struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
> + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
> + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
> + struct kvmhv_tb_accumulator guest_time; /* guest execution */
> + struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
> +
> + struct dentry *debugfs_dir;
> + struct dentry *debugfs_timings;
>  #endif
>  };
>  
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index 03cbada..10fc784 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
>  
>  DECLARE_PER_CPU(u64, decrementers_next_tb);
>  
> +/* Convert timebase ticks to nanoseconds */
> +unsigned long long tb_to_ns(unsigned long long tb_ticks);
> +
>  #endif /* __KERNEL__ */
>  #endif /* __POWERPC_TIME_H */
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 4717859..ec9f59c 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -458,6 +458,17 @@ int main(void)
>   DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
>   DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
>   DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
> + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
> + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
> + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
> + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
> + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
> + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
> + DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
> arch.cur_tb_start));
> + DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
> + DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
> + DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
> + DEFINE(TAS_MAX

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Alexander Graf


On 20.03.15 10:39, Paul Mackerras wrote:
> From: Bharata B Rao 
> 
> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> correctly, certain work arounds have to be employed to allow reuse of
> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> and reuse it later during next hotplug.
> 
> More details can be found here:
> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> 
> In order to support this workaround with PowerPC KVM, don't create or
> initialize ICP if the vCPU is found to be already associated with an ICP.
> 
> Signed-off-by: Bharata B Rao 
> Signed-off-by: Paul Mackerras 

This probably makes some sense, but please make sure that user space has
some way to figure out whether hotplug works at all.

Also Paul, for patches that you pick up from others, I'd prefer if they
send the patches to the ML themselves first and you pick them up from
there then. That way we give everyone the same treatment.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Alexander Graf


On 20.03.15 10:39, Paul Mackerras wrote:
> This is my current patch queue for HV KVM on PPC.  This series is
> based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
> plus a set of recent KVM changes which don't intersect with the
> changes in this series.  On top of that, in my testing I have some
> patches which are not KVM-related but are needed to boot and run a
> recent upstream kernel successfully:
> 
> tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
> tick/hotplug: Handover time related duties before cpu offline
> powerpc/powernv: Check image loaded or not before calling flash
> powerpc/powernv: Fixes for hypervisor doorbell handling
> powerpc/powernv: Fix return value from power7_nap() et al.
> powerpc: Export __spin_yield
> 
> These patches have been posted by their authors and are on their way
> upstream via various trees.  They are not included in this series.
> 
> The first three patches are bug fixes that should go into v4.0 if
> possible.

Thanks, applied the first 3 to my for-4.0 branch which is going through
autotest now. If everything runs fine, I'll send it to Paolo for
upstream merge.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-03-16 Thread Alexander Graf


On 16.03.15 21:41, David Gibson wrote:
> On Thu, Feb 05, 2015 at 01:57:11AM +0100, Alexander Graf wrote:
>>
>>
>> On 05.02.15 01:53, David Gibson wrote:
>>> On POWER, storage caching is usually configured via the MMU - attributes
>>> such as cache-inhibited are stored in the TLB and the hashed page table.
>>>
>>> This makes correctly performing cache inhibited IO accesses awkward when
>>> the MMU is turned off (real mode).  Some CPU models provide special
>>> registers to control the cache attributes of real mode load and stores but
>>> this is not at all consistent.  This is a problem in particular for SLOF,
>>> the firmware used on KVM guests, which runs entirely in real mode, but
>>> which needs to do IO to load the kernel.
>>>
>>> To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
>>> and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
>>> a logical address (aka guest physical address).  SLOF uses these for IO.
>>>
>>> However, because these are implemented within qemu, not the host kernel,
>>> these bypass any IO devices emulated within KVM itself.  The simplest way
>>> to see this problem is to attempt to boot a KVM guest from a virtio-blk
>>> device with iothread / dataplane enabled.  The iothread code relies on an
>>> in kernel implementation of the virtio queue notification, which is not
>>> triggered by the IO hcalls, and so the guest will stall in SLOF unable to
>>> load the guest OS.
>>>
>>> This patch addresses this by providing in-kernel implementations of the
>>> 2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
>>> address not handled by the KVM IO bus will cause a VM exit, hitting the
>>> qemu implementation as before.
>>>
>>> Note that a userspace change is also required, in order to enable these
>>> new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
>>>
>>> Signed-off-by: David Gibson 
>>
>> Thanks, applied to kvm-ppc-queue.
> 
> Any news on when this might go up to mainline?

I'm aiming for 4.1.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "target-ppc: Create versionless CPU class per family if KVM"

2015-03-03 Thread Alexander Graf


On 03.03.15 01:42, Alexey Kardashevskiy wrote:
> On 03/03/2015 12:51 AM, Alexander Graf wrote:
>>
>>
>> On 02.03.15 14:42, Andreas Färber wrote:
>>> Am 02.03.2015 um 14:37 schrieb Alexander Graf:
>>>> On 01.03.15 01:31, Andreas Färber wrote:
>>>>> This reverts commit 5b79b1cadd3e565b6d1a5ba59764bd47af58b271 to avoid
>>>>> double-registration of types:
>>>>>
>>>>>Registering `POWER5+-powerpc64-cpu' which already exists
>>>>>
>>>>> Taking the textual description of a CPU type as part of a new type
>>>>> name
>>>>> is plain wrong, and so is unconditionally registering a new type here.
>>>>>
>>>>> Cc: Alexey Kardashevskiy 
>>>>> Cc: qemu-sta...@nongnu.org
>>>>> Signed-off-by: Andreas Färber 
>>>>
>>>> Doesn't this break p8 support?
>>>
>>> Maybe, but p5 support was in longer and this is definitely a regression
>>> and really really wrong. If you know a way to fix it without handing it
>>> back to the IBM guys for more thought, feel free to give it a shot.
>>
>> I honestly don't fully remember what this was about. Wasn't this our
>> special KVM class that we use to create a compatible cpu type on the fly?
>>
>> Alexey, please take a look at it.
> 
> 
> I sent a note yesterday :-/ Here it is again:
> 
> With this revert, running qemu with HV KVM and -cpu POWER7 fails on real
> POWER7 machine as my machine has pvr 003f 0201 and POWER7 is an alias of
> POWER7_v2.3 (pvr 003f 0203); and this is what I tried to fix at the
> first place. QEMU looks at classes first, and if not found - at aliases,
> so this worked.
> 
> I would rename "POWER5+" to "POWER5+_0.0" and make "POWER5+" an alias
> for POWER5+_v2.1 (or POWER5+_0.0).

Care to send a patch?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-03 Thread Alexander Graf

On 02/19/2015 11:54 AM, Ard Biesheuvel wrote:

This is a 0th order approximation of how we could potentially force the guest
to avoid uncached mappings, at least from the moment the MMU is on. (Before
that, all of memory is implicitly classified as Device-nGnRnE)

The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached mappings
with cached ones. This way, there is no need to mangle any guest page tables.

The downside is that, to do this correctly, we need to always trap writes to
the VM sysreg group, which includes registers that the guest may write to very
often. To reduce the associated performance hit, patch #1 introduces a fast path
for EL2 to perform trivial sysreg writes on behalf of the guest, without the
need for a full world switch to the host and back.

The main purpose of these patches is to quantify the performance hit, and
verify whether the MAIR_EL1 handling works correctly.


I gave this a quick spin on a VM running with QEMU.

  * VGA output is still distorted, I get random junk black lines in the 
output in between
  * When I add -device nec-usb-xhci -device usb-kbd the VM doesn't even 
boot up


With TCG, both bits work fine.


Alex



Ard Biesheuvel (3):
   arm64: KVM: handle some sysreg writes in EL2
   arm64: KVM: mangle MAIR register to prevent uncached guest mappings
   arm64: KVM: keep trapping of VM sysreg writes enabled

  arch/arm/kvm/mmu.c   |   2 +-
  arch/arm64/include/asm/kvm_arm.h |   2 +-
  arch/arm64/kvm/hyp.S | 101 +++
  arch/arm64/kvm/sys_regs.c|  63 
  4 files changed, 156 insertions(+), 12 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "target-ppc: Create versionless CPU class per family if KVM"

2015-03-02 Thread Alexander Graf


On 02.03.15 14:42, Andreas Färber wrote:
> Am 02.03.2015 um 14:37 schrieb Alexander Graf:
>> On 01.03.15 01:31, Andreas Färber wrote:
>>> This reverts commit 5b79b1cadd3e565b6d1a5ba59764bd47af58b271 to avoid
>>> double-registration of types:
>>>
>>>   Registering `POWER5+-powerpc64-cpu' which already exists
>>>
>>> Taking the textual description of a CPU type as part of a new type name
>>> is plain wrong, and so is unconditionally registering a new type here.
>>>
>>> Cc: Alexey Kardashevskiy 
>>> Cc: qemu-sta...@nongnu.org
>>> Signed-off-by: Andreas Färber 
>>
>> Doesn't this break p8 support?
> 
> Maybe, but p5 support was in longer and this is definitely a regression
> and really really wrong. If you know a way to fix it without handing it
> back to the IBM guys for more thought, feel free to give it a shot.

I honestly don't fully remember what this was about. Wasn't this our
special KVM class that we use to create a compatible cpu type on the fly?

Alexey, please take a look at it.


Alex

> 
> Andreas
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models

2015-02-20 Thread Alexander Graf


On 20.02.15 20:43, Michael Mueller wrote:
> On Fri, 20 Feb 2015 18:50:20 +0100
> Alexander Graf  wrote:
> 
>>
>>
>>
>>> Am 20.02.2015 um 18:37 schrieb Michael Mueller :
>>>
>>> On Fri, 20 Feb 2015 17:57:52 +0100
>>> Alexander Graf  wrote:
>>>
>>>> Because all CPUs we have in our list only expose 128 bits?
>>>
>>> Here a STFLE result on a EC12 GA2, already more than 128 bits... Is that 
>>> model on the list?
>>
>> If that model has 3 elements, yes, the array should span 3.
>>
>> I hope it's in the list. Every model wecare about should be, no?
>>
> 
> On my list? Yes!
> 
>>>
>>> [mimu@p57lp59 s390xfac]$ ./s390xfac -b
>>> fac[0] = 0xfbfbfcfff840
>>> fac[1] = 0xffde
>>> fac[2] = 0x1800
>>>>
>>>>> I want to have this independent from a future machine of the z/Arch. The 
>>>>> kernel stores the
>>>>> full facility set, KVM does and there is no good reason for QEMU not to 
>>>>> do. If other
>>>>> accelerators decide to just implement 64 or 128 bits of facilities that's 
>>>>> ok...  
>>>>
>>>> So you want to support CPUs that are not part of the list?
>>>
>>> The architecture at least defines more than 2 or 3. Do you want me to limit 
>>> it to an arbitrary
>>> size?. Only in QEMU or also in the KVM interface?
>>
>> Only internally in QEMU. The kvm interface should definitely be as big as 
>> the spec allows!
> 
> Right, now we're on the same page again. That can be taken in consideration. 
> ... Although it's
> just and optimization. :-)

Yeah. You could also consider using the QEMU built-in bitmap type and
functions and just convert from there. That would give you native
support for bit values > 64.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 10/15] cpu-model/s390: Add cpu class initialization routines

2015-02-20 Thread Alexander Graf



> Am 20.02.2015 um 19:59 schrieb Michael Mueller :
> 
> On Fri, 20 Feb 2015 10:11:55 -0800
> Richard Henderson  wrote:
> 
>>> +static inline uint64_t big_endian_bit(unsigned long nr)
>>> +{
>>> +return 1ul << (BITS_PER_LONG - (nr % BITS_PER_LONG));
>>> +};  
>> 
>> This is buggy.  NR=0 should map to 63, not 64.
> 
> I'm sure I was asked to replace my constant 64 and 63 with that defines and 
> at the end I messed
> it up... :-(
> 
>> 
>>> +return !!(*ptr & big_endian_bit(nr));  
>> 
>> Personally I dislike !! as an idiom.  Given that big_endian_bit isn't used
>> anywhere else, can we integrate it and change this to
>> 
>> static inline int test_facility(unsigned long nr, uint64_t *fac_list)
>> {
>>  unsigned long word = nr / BITS_PER_LONG;
>>  unsigned long be_bit = 63 - (nr % BITS_PER_LONG);
>>  return (fac_list[word] >> be_bit) & 1;
>> }
> 
> Yes, I just use it in this context. I will integrate your version.
> 
> BTW I changed the whole facility defining code to be generated by an external 
> helper at compile
> time. That is more simple and safe to change. I will send it with v3. See 
> attachment for an
> example of the generated header file.

Please make sure to use ULL with constants and uint64_t on variables. Long is 
almost always wrong in QEMU.

Alex

> 
> Thanks,
> Michael
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models

2015-02-20 Thread Alexander Graf



> Am 20.02.2015 um 18:37 schrieb Michael Mueller :
> 
> On Fri, 20 Feb 2015 17:57:52 +0100
> Alexander Graf  wrote:
> 
>> Because all CPUs we have in our list only expose 128 bits?
> 
> Here a STFLE result on a EC12 GA2, already more than 128 bits... Is that 
> model on the list?

If that model has 3 elements, yes, the array should span 3.

I hope it's in the list. Every model wecare about should be, no?

> 
> [mimu@p57lp59 s390xfac]$ ./s390xfac -b
> fac[0] = 0xfbfbfcfff840
> fac[1] = 0xffde
> fac[2] = 0x1800
>> 
>>> I want to have this independent from a future machine of the z/Arch. The 
>>> kernel stores the
>>> full facility set, KVM does and there is no good reason for QEMU not to do. 
>>> If other
>>> accelerators decide to just implement 64 or 128 bits of facilities that's 
>>> ok...  
>> 
>> So you want to support CPUs that are not part of the list?
> 
> The architecture at least defines more than 2 or 3. Do you want me to limit 
> it to an arbitrary
> size?. Only in QEMU or also in the KVM interface?

Only internally in QEMU. The kvm interface should definitely be as big as the 
spec allows!

Alex

> 
> Thanks
> Michael
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 13/15] cpu-model/s390: Add processor property routines

2015-02-20 Thread Alexander Graf


On 20.02.15 16:32, Michael Mueller wrote:
> On Fri, 20 Feb 2015 15:03:30 +0100
> Alexander Graf  wrote:
> 
>>>
>>> - s390_get_proceccor_props()
>>> - s390_set_proceccor_props()
>>>
>>> They can be used to request or retrieve processor related information from 
>>> an accelerator.
>>> That information comprises the cpu identifier, the ICB value and the 
>>> facility lists.
>>>
>>> Signed-off-by: Michael Mueller   
>>
>> Hrm, I still seem to miss the point of this interface. What do you need
>> it for?
> 
> These functions make the internal s390 cpu model API independent from a 
> specific accelerator:  
> 
> int s390_set_processor_props(S390ProcessorProps *prop)
> {
> if (kvm_enabled()) {
> return kvm_s390_set_processor_props(prop);
> }
> return -ENOSYS;
> }
> 
> It's called by:
> 
> s390_select_cpu_model(const char *model)
> 
> which is itself called by:
> 
> S390CPU *cpu_s390x_init(const char *cpu_model)
> {
> S390CPU *cpu;
> 
> cpu = S390_CPU(object_new(s390_select_cpu_model(cpu_model)));
> 
> object_property_set_bool(OBJECT(cpu), true, "realized", NULL);
> 
> return cpu;
> }
> 
> So above s390_set/get_processor_props() the code is accelerator independent.

Any particular reason you can't do it like PPC?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 09/15] cpu-model/s390: Add KVM VM attribute interface routines

2015-02-20 Thread Alexander Graf


On 20.02.15 16:18, Michael Mueller wrote:
> On Fri, 20 Feb 2015 14:59:20 +0100
> Alexander Graf  wrote:
> 
>>> +typedef struct S390ProcessorProps {
>>> +uint64_t cpuid;
>>> +uint16_t ibc;
>>> +uint8_t  pad[6];
>>> +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64];
>>> +} S390ProcessorProps;
>>> +
>>> +typedef struct S390MachineProps {
>>> +uint64_t cpuid;
>>> +uint32_t ibc_range;
>>> +uint8_t  pad[4];
>>> +uint64_t fac_list_mask[S390_ARCH_FAC_LIST_SIZE_UINT64];
>>> +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64];
>>> +} S390MachineProps;  
>>
>> What are those structs there for? To convert between a kvm facing
>> interface to an internal interface?
> 
> Yes, that's their current use, but if the interface structs: 
> 
> +struct kvm_s390_vm_cpu_processor {
> +   __u64 cpuid;
> +   __u16 ibc;
> +   __u8  pad[6];
> +   __u64 fac_list[256];
> +};
> +
> +/* kvm S390 machine related attributes are r/o */
> +#define KVM_S390_VM_CPU_MACHINE1
> +struct kvm_s390_vm_cpu_machine {
> +   __u64 cpuid;
> +   __u32 ibc_range;
> +   __u8  pad[4];
> +   __u64 fac_mask[256];
> +   __u64 fac_list[256];
> +};
> 
> are visible here, I'll reuse them... But stop, that will not work in the 
> --disable-kvm case... I need them!

I meant it the other way around - do KVM specific patching of the cpu
types from kvm.c.

But please give a nutshell explanation on what exactly you're patching
at all here.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models

2015-02-20 Thread Alexander Graf


On 20.02.15 16:49, Michael Mueller wrote:
> On Fri, 20 Feb 2015 16:22:20 +0100
> Alexander Graf  wrote:
> 
>>>>
>>>> Just make this uint64_t fac_list[2]. That way we don't have to track any
>>>> messy allocations.  
>>>
>>> It will be something like "uint64_t 
>>> fac_list[S390_CPU_FAC_LIST_SIZE_UINT64]" and in total 2KB
>>> not just 16 bytes but I will change it.   
>>
>> Why? Do we actually need that many? This is a qemu internal struct.
> 
> How do you know that 2 is a good size?

Because all CPUs we have in our list only expose 128 bits?

> I want to have this independent from a future machine of the z/Arch. The 
> kernel stores the full
> facility set, KVM does and there is no good reason for QEMU not to do. If 
> other accelerators
> decide to just implement 64 or 128 bits of facilities that's ok...

So you want to support CPUs that are not part of the list?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models

2015-02-20 Thread Alexander Graf



> Am 20.02.2015 um 16:00 schrieb Michael Mueller :
> 
> On Fri, 20 Feb 2015 14:54:23 +0100
> Alexander Graf  wrote:
> 
>>> 
>>> +/* machine related properties */
>>> +typedef struct S390CPUMachineProps {
>>> +uint16_t class;  /* machine class */
>>> +uint16_t ga; /* availability number of machine */
>>> +uint16_t order;  /* order of availability */
>>> +} S390CPUMachineProps;
>>> +
>>> +/* processor related properties */
>>> +typedef struct S390CPUProcessorProps {
>>> +uint16_t gen;/* S390 CMOS generation */
>>> +uint16_t ver;/* version of processor */
>>> +uint32_t id; /* processor identification*/
>>> +uint16_t type;   /* machine type */
>>> +uint16_t ibc;/* IBC value */
>>> +uint64_t *fac_list;  /* list of facilities */  
>> 
>> Just make this uint64_t fac_list[2]. That way we don't have to track any
>> messy allocations.
> 
> It will be something like "uint64_t fac_list[S390_CPU_FAC_LIST_SIZE_UINT64]" 
> and in total 2KB not
> just 16 bytes but I will change it. 

Why? Do we actually need that many? This is a qemu internal struct.

Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 13/15] cpu-model/s390: Add processor property routines

2015-02-20 Thread Alexander Graf


On 17.02.15 15:24, Michael Mueller wrote:
> This patch implements the functions:
> 
> - s390_get_proceccor_props()
> - s390_set_proceccor_props()
> 
> They can be used to request or retrieve processor related information from an 
> accelerator.
> That information comprises the cpu identifier, the ICB value and the facility 
> lists.
> 
> Signed-off-by: Michael Mueller 

Hrm, I still seem to miss the point of this interface. What do you need
it for?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 09/15] cpu-model/s390: Add KVM VM attribute interface routines

2015-02-20 Thread Alexander Graf


On 17.02.15 15:24, Michael Mueller wrote:
> The patch implements routines to set and retrieve processor configuration
> data and to retrieve machine configuration data. The machine related data
> is used together with the cpu model facility lists to determine the list of
> supported cpu models of this host. The above mentioned routines have QEMU
> trace point instrumentation.
> 
> Signed-off-by: Michael Mueller 
> ---
>  target-s390x/cpu-models.h |  39 ++
>  target-s390x/kvm.c| 102 
> ++
>  trace-events  |   3 ++
>  3 files changed, 144 insertions(+)
> 
> diff --git a/target-s390x/cpu-models.h b/target-s390x/cpu-models.h
> index 623a7b2..76b3456 100644
> --- a/target-s390x/cpu-models.h
> +++ b/target-s390x/cpu-models.h
> @@ -45,6 +45,45 @@ typedef struct S390CPUAlias {
>  char *model;
>  } S390CPUAlias;
>  
> +typedef struct S390ProcessorProps {
> +uint64_t cpuid;
> +uint16_t ibc;
> +uint8_t  pad[6];
> +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64];
> +} S390ProcessorProps;
> +
> +typedef struct S390MachineProps {
> +uint64_t cpuid;
> +uint32_t ibc_range;
> +uint8_t  pad[4];
> +uint64_t fac_list_mask[S390_ARCH_FAC_LIST_SIZE_UINT64];
> +uint64_t fac_list[S390_ARCH_FAC_LIST_SIZE_UINT64];
> +} S390MachineProps;

What are those structs there for? To convert between a kvm facing
interface to an internal interface?

I don't think they're necessary. The internal layout is visible from the
KVM code. Just either spawn the class straight from the kvm file or if
you consider that ugly, pass the values of that struct that you need as
function parameters to a function in cpu-models.c.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models

2015-02-20 Thread Alexander Graf


On 17.02.15 15:24, Michael Mueller wrote:
> This patch implements the static part of the s390 cpu class definitions.
> It defines s390 cpu models by means of virtual cpu ids (enum) which contain
> information on the cpu generation, the machine class, the GA number and
> the machine type. The cpu id is used to instantiate a cpu class per cpu
> model.
> 
> In addition the patch introduces the QMP enumeration AccelId. It is used
> to index certain cpu model poperties per accelerator.
> 
> Furthermore it extends the existing S390CPUClass by model related properties.
> 
> Signed-off-by: Michael Mueller 
> Reviewed-by: Thomas Huth 
> ---
>  qapi-schema.json   | 11 +++
>  target-s390x/Makefile.objs |  1 +
>  target-s390x/cpu-models.c  | 79 
> ++
>  target-s390x/cpu-models.h  | 71 +
>  target-s390x/cpu-qom.h | 22 +
>  target-s390x/cpu.c |  2 ++
>  6 files changed, 186 insertions(+)
>  create mode 100644 target-s390x/cpu-models.c
>  create mode 100644 target-s390x/cpu-models.h
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index e16f8eb..4d237c8 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2473,6 +2473,17 @@
>  ##
>  { 'command': 'query-machines', 'returns': ['MachineInfo'] }
>  
> +
> +##
> +# @AccelId
> +#
> +# Defines accelerator ids
> +#
> +# Since: 2.3.0
> +##
> +{ 'enum': 'AccelId',
> +  'data': ['qtest', 'tcg', 'kvm', 'xen'  ] }
> +
>  ##
>  # @CpuDefinitionInfo:
>  #
> diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
> index 2c57494..9f55140 100644
> --- a/target-s390x/Makefile.objs
> +++ b/target-s390x/Makefile.objs
> @@ -1,5 +1,6 @@
>  obj-y += translate.o helper.o cpu.o interrupt.o
>  obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
>  obj-y += gdbstub.o
> +obj-y += cpu-models.o
>  obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o
>  obj-$(CONFIG_KVM) += kvm.o
> diff --git a/target-s390x/cpu-models.c b/target-s390x/cpu-models.c
> new file mode 100644
> index 000..4841553
> --- /dev/null
> +++ b/target-s390x/cpu-models.c
> @@ -0,0 +1,79 @@
> +/*
> + * CPU models for s390
> + *
> + * Copyright 2014,2015 IBM Corp.
> + *
> + * Author(s): Michael Mueller 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "cpu-models.h"
> +
> +#define S390_PROC_DEF(_name, _cpu_id, _desc)\
> +static void \
> +glue(_cpu_id, _cpu_class_init)  \
> +(ObjectClass *oc, void *data)   \
> +{   \
> +DeviceClass *dc = DEVICE_CLASS(oc); \
> +S390CPUClass *cc = S390_CPU_CLASS(oc);  \
> +\
> +cc->is_active[ACCEL_ID_KVM] = true; \
> +cc->mach= g_malloc0(sizeof(S390CPUMachineProps));   \
> +cc->mach->ga= cpu_ga(_cpu_id);  \
> +cc->mach->class = cpu_class(_cpu_id);   \
> +cc->mach->order = cpu_order(_cpu_id);   \
> +cc->proc= g_malloc0(sizeof(S390CPUProcessorProps)); \
> +cc->proc->gen   = cpu_generation(_cpu_id);  \
> +cc->proc->ver   = S390_DEF_VERSION; \
> +cc->proc->id= S390_DEF_ID;  \
> +cc->proc->type  = cpu_type(_cpu_id);\
> +cc->proc->ibc   = S390_DEF_IBC; \
> +dc->desc= _desc;\
> +}   \
> +static const TypeInfo   \
> +glue(_cpu_id, _cpu_type_info) = {   \
> +.name   = _name "-" TYPE_S390_CPU,  \
> +.parent = TYPE_S390_CPU,\
> +.class_init = glue(_cpu_id, _cpu_class_init),   \
> +};  \
> +static void \
> +glue(_cpu_id, _cpu_register_types)(void)\
> +{   \
> +type_register_static(   \
> +&glue(_cpu_id, _cpu_type_info));  

Re: [RFC PATCH v2 04/15] cpu-model/s390: Introduce S390 CPU models

2015-02-20 Thread Alexander Graf


On 17.02.15 15:24, Michael Mueller wrote:
> This patch implements the static part of the s390 cpu class definitions.
> It defines s390 cpu models by means of virtual cpu ids (enum) which contain
> information on the cpu generation, the machine class, the GA number and
> the machine type. The cpu id is used to instantiate a cpu class per cpu
> model.
> 
> In addition the patch introduces the QMP enumeration AccelId. It is used
> to index certain cpu model poperties per accelerator.
> 
> Furthermore it extends the existing S390CPUClass by model related properties.
> 
> Signed-off-by: Michael Mueller 
> Reviewed-by: Thomas Huth 
> ---
>  qapi-schema.json   | 11 +++
>  target-s390x/Makefile.objs |  1 +
>  target-s390x/cpu-models.c  | 79 
> ++
>  target-s390x/cpu-models.h  | 71 +
>  target-s390x/cpu-qom.h | 22 +
>  target-s390x/cpu.c |  2 ++
>  6 files changed, 186 insertions(+)
>  create mode 100644 target-s390x/cpu-models.c
>  create mode 100644 target-s390x/cpu-models.h
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index e16f8eb..4d237c8 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2473,6 +2473,17 @@
>  ##
>  { 'command': 'query-machines', 'returns': ['MachineInfo'] }
>  
> +
> +##
> +# @AccelId
> +#
> +# Defines accelerator ids
> +#
> +# Since: 2.3.0
> +##
> +{ 'enum': 'AccelId',
> +  'data': ['qtest', 'tcg', 'kvm', 'xen'  ] }
> +
>  ##
>  # @CpuDefinitionInfo:
>  #
> diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
> index 2c57494..9f55140 100644
> --- a/target-s390x/Makefile.objs
> +++ b/target-s390x/Makefile.objs
> @@ -1,5 +1,6 @@
>  obj-y += translate.o helper.o cpu.o interrupt.o
>  obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
>  obj-y += gdbstub.o
> +obj-y += cpu-models.o
>  obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o
>  obj-$(CONFIG_KVM) += kvm.o
> diff --git a/target-s390x/cpu-models.c b/target-s390x/cpu-models.c
> new file mode 100644
> index 000..4841553
> --- /dev/null
> +++ b/target-s390x/cpu-models.c
> @@ -0,0 +1,79 @@
> +/*
> + * CPU models for s390
> + *
> + * Copyright 2014,2015 IBM Corp.
> + *
> + * Author(s): Michael Mueller 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "cpu-models.h"
> +
> +#define S390_PROC_DEF(_name, _cpu_id, _desc)\
> +static void \
> +glue(_cpu_id, _cpu_class_init)  \
> +(ObjectClass *oc, void *data)   \
> +{   \
> +DeviceClass *dc = DEVICE_CLASS(oc); \
> +S390CPUClass *cc = S390_CPU_CLASS(oc);  \
> +\
> +cc->is_active[ACCEL_ID_KVM] = true; \
> +cc->mach= g_malloc0(sizeof(S390CPUMachineProps));   \
> +cc->mach->ga= cpu_ga(_cpu_id);  \
> +cc->mach->class = cpu_class(_cpu_id);   \
> +cc->mach->order = cpu_order(_cpu_id);   \
> +cc->proc= g_malloc0(sizeof(S390CPUProcessorProps)); \
> +cc->proc->gen   = cpu_generation(_cpu_id);  \
> +cc->proc->ver   = S390_DEF_VERSION; \
> +cc->proc->id= S390_DEF_ID;  \
> +cc->proc->type  = cpu_type(_cpu_id);\
> +cc->proc->ibc   = S390_DEF_IBC; \
> +dc->desc= _desc;\
> +}   \
> +static const TypeInfo   \
> +glue(_cpu_id, _cpu_type_info) = {   \
> +.name   = _name "-" TYPE_S390_CPU,  \
> +.parent = TYPE_S390_CPU,\
> +.class_init = glue(_cpu_id, _cpu_class_init),   \
> +};  \
> +static void \
> +glue(_cpu_id, _cpu_register_types)(void)\
> +{   \
> +type_register_static(   \
> +&glue(_cpu_id, _cpu_type_info));  

Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Alexander Graf


On 19.02.15 15:56, Ard Biesheuvel wrote:
> On 19 February 2015 at 14:50, Alexander Graf  wrote:
>>
>>
>> On 19.02.15 11:54, Ard Biesheuvel wrote:
>>> This is a 0th order approximation of how we could potentially force the 
>>> guest
>>> to avoid uncached mappings, at least from the moment the MMU is on. (Before
>>> that, all of memory is implicitly classified as Device-nGnRnE)
>>>
>>> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
>>> mappings
>>> with cached ones. This way, there is no need to mangle any guest page 
>>> tables.
>>
>> Would you mind to give a brief explanation on what this does? What
>> happens to actually assigned devices that need to be mapped as uncached?
>> What happens to DMA from such devices when the guest assumes that it's
>> accessing RAM uncached and then triggers DMA?
>>
> 
> On ARM, stage 2 mappings that are more strict will supersede stage 1
> mappings, so the idea is to use cached mappings exclusively for stage
> 1 so that the host is fully in control of the actual memory attributes
> by setting the attributes at stage 2. This also makes sense because
> the host will ultimately know better whether some range that the guest
> thinks is a device is actually a device or just emulated (no stage 2
> mapping), backed by host memory (such as the NOR flash read case) or
> backed by a passthrough device.

Ok, so that means if the guest maps RAM as uncached, it will actually
end up as cached memory. Now if the guest triggers a DMA request to a
passed through device to that RAM, it will conflict with the cache.

I don't know whether it's a big deal, but it's the scenario that came up
with the approach above before when I talked to people about it.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Alexander Graf


On 19.02.15 11:54, Ard Biesheuvel wrote:
> This is a 0th order approximation of how we could potentially force the guest
> to avoid uncached mappings, at least from the moment the MMU is on. (Before
> that, all of memory is implicitly classified as Device-nGnRnE)
> 
> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
> mappings
> with cached ones. This way, there is no need to mangle any guest page tables.

Would you mind to give a brief explanation on what this does? What
happens to actually assigned devices that need to be mapped as uncached?
What happens to DMA from such devices when the guest assumes that it's
accessing RAM uncached and then triggers DMA?


Alex

> 
> The downside is that, to do this correctly, we need to always trap writes to
> the VM sysreg group, which includes registers that the guest may write to very
> often. To reduce the associated performance hit, patch #1 introduces a fast 
> path
> for EL2 to perform trivial sysreg writes on behalf of the guest, without the
> need for a full world switch to the host and back.
> 
> The main purpose of these patches is to quantify the performance hit, and
> verify whether the MAIR_EL1 handling works correctly. 
> 
> Ard Biesheuvel (3):
>   arm64: KVM: handle some sysreg writes in EL2
>   arm64: KVM: mangle MAIR register to prevent uncached guest mappings
>   arm64: KVM: keep trapping of VM sysreg writes enabled
> 
>  arch/arm/kvm/mmu.c   |   2 +-
>  arch/arm64/include/asm/kvm_arm.h |   2 +-
>  arch/arm64/kvm/hyp.S | 101 
> +++
>  arch/arm64/kvm/sys_regs.c|  63 
>  4 files changed, 156 insertions(+), 12 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: H_CLEAR_REF and H_CLEAR_MOD

2015-02-18 Thread Alexander Graf

> Am 18.02.2015 um 07:12 schrieb Nathan Whitehorn :
> 
> It seems like KVM doesn't implement the H_CLEAR_REF and H_CLEAR_MOD 
> hypervisor calls, which are absolutely critical for memory management in the 
> FreeBSD kernel (and are marked "mandatory" in the PAPR manual). It seems some 
> patches have been contributed already in 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/095013.html, so 
> it would be fantastic if these could end up upstream.

Paul, I guess we never included this because  there was no user. If FreeBSD 
does use it though, I think it makes a lot of sense to resend it for inclusion.

> 
> I'm going to try to get some kind of workaround in the meantime so we can at 
> least run on existing kernels.

Please don't add hacks in FreeBSD only because kvm is missing a feature. Let's 
just get this done properly :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


VFIO iommu page size masking

2015-02-12 Thread Alexander Graf
Hi Alex,

While trying to get VFIO-PCI working on AArch64 (with 64k page size), I
stumbled over the following piece of code:

> static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> {
> struct vfio_domain *domain;
> unsigned long bitmap = PAGE_MASK;
> 
> mutex_lock(&iommu->lock);
> list_for_each_entry(domain, &iommu->domain_list, next)
> bitmap &= domain->domain->ops->pgsize_bitmap;
> mutex_unlock(&iommu->lock);
> 
> return bitmap;
> }

The SMMU page mask is

[3.054302] arm-smmu e0a0.smmu:  Supported page sizes: 0x40201000

but after this function, we end up supporting one 2MB pages and above.
The reason for that is simple: You restrict the bitmap to PAGE_MASK and
above.

Now the big question is why you're doing that. I don't see why it would
be a problem if the IOMMU maps a page in smaller chunks.

So I tried to patch the code above with s/PAGE_MASK/1UL/ and everything
seems to run fine. But maybe we're not lacking some sanity checks?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >