Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

2013-11-18 Thread tiejun.chen

On 10/23/2013 09:26 AM, Tiejun Chen wrote:

We enter with interrupts disabled in hardware, but we need to
call RECONCILE_IRQ_STATE anyway to ensure that the software state
is kept in sync instead of calling hard_irq_disable() directly.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
v5:

Fix one typo in the comment.


Alex,

I already addressed Scott's comment, any further feedback?

Tiejun



v4:

Fix one typo in the patch description.

v3:

Base on the latest tree, now we can use RECONCILE_IRQ_STATE instead of 
SOFT_DISABLE_INTS.

v2:

Move SOFT_DISABLE_INTS[1] earlier to avoid clobbering the arguments we want to 
pass to kvmppc_handle_exit.

  arch/powerpc/kvm/booke.c  |   11 ---
  arch/powerpc/kvm/bookehv_interrupts.S |   11 +++
  2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 15d0149..0d211ff 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -899,17 +899,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
int s;
int idx;

-#ifdef CONFIG_PPC64
-   WARN_ON(local_paca-irq_happened != 0);
-#endif
-
-   /*
-* We enter with interrupts disabled in hardware, but
-* we need to call hard_irq_disable anyway to ensure that
-* the software state is kept in sync.
-*/
-   hard_irq_disable();
-
/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..191c32b 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -33,6 +33,8 @@

  #ifdef CONFIG_64BIT
  #include asm/exception-64e.h
+#include asm/hw_irq.h
+#include asm/irqflags.h
  #else
  #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */
  #endif
@@ -465,6 +467,15 @@ _GLOBAL(kvmppc_resume_host)
mtspr   SPRN_EPCR, r3
isync

+#ifdef CONFIG_64BIT
+   /*
+* We enter with interrupts disabled in hardware, but
+* we need to call RECONCILE_IRQ_STATE anyway to ensure
+* that the software state is kept in sync.
+*/
+   RECONCILE_IRQ_STATE(r3,r5)
+#endif
+
/* Switch to kernel stack and jump to handler. */
PPC_LL  r3, HOST_RUN(r1)
mr  r5, r14 /* intno */



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

2013-10-21 Thread tiejun.chen

On 10/21/2013 05:49 PM, Tiejun Chen wrote:

We enter with interrupts disabled in hardware, but we need to
call SOFT_DISABLE_INTS anyway to ensure that the software state


OOPS! Here I should change this with RECONCILE_IRQ_STATE :( Please ignore this 
to see next version directly.


Tiejun


is kept in sync instead of calling hard_irq_disable() directly.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
v3:

Base on the latest tree, now we can use RECONCILE_IRQ_STATE instead of 
SOFT_DISABLE_INTS.

v2:

Move SOFT_DISABLE_INTS[1] earlier to avoid clobbering the arguments we want to 
pass to kvmppc_handle_exit.

  arch/powerpc/kvm/booke.c  |   11 ---
  arch/powerpc/kvm/bookehv_interrupts.S |   11 +++
  2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 15d0149..0d211ff 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -899,17 +899,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
int s;
int idx;

-#ifdef CONFIG_PPC64
-   WARN_ON(local_paca-irq_happened != 0);
-#endif
-
-   /*
-* We enter with interrupts disabled in hardware, but
-* we need to call hard_irq_disable anyway to ensure that
-* the software state is kept in sync.
-*/
-   hard_irq_disable();
-
/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..4e867b1 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -33,6 +33,8 @@

  #ifdef CONFIG_64BIT
  #include asm/exception-64e.h
+#include asm/hw_irq.h
+#include asm/irqflags.h
  #else
  #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */
  #endif
@@ -465,6 +467,15 @@ _GLOBAL(kvmppc_resume_host)
mtspr   SPRN_EPCR, r3
isync

+#ifdef CONFIG_64BIT
+   /*
+* We enter with interrupts disabled in hardware, but
+* we need to call SOFT_DISABLE_INTS anyway to ensure
+* that the software state is kept in sync.
+*/
+   RECONCILE_IRQ_STATE(r3,r5)
+#endif
+
/* Switch to kernel stack and jump to handler. */
PPC_LL  r3, HOST_RUN(r1)
mr  r5, r14 /* intno */



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2][PATCH 1/1] KVM: PPC: Book3E HV: call SOFT_DISABLE_INTS to sync the software state

2013-08-14 Thread tiejun.chen

On 08/15/2013 05:01 AM, Benjamin Herrenschmidt wrote:

On Wed, 2013-08-14 at 14:34 -0500, Scott Wood wrote:

On Wed, 2013-08-14 at 13:56 +0200, Alexander Graf wrote:

On 07.08.2013, at 04:05, Tiejun Chen wrote:


We enter with interrupts disabled in hardware, but we need to
call SOFT_DISABLE_INTS anyway to ensure that the software state
is kept in sync instead of calling hard_irq_disable() directly.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com


Scott, Ben, could you please ack?


ACK, but it needs to be updated due to this patch that renames
SOFT_DISABLE_INTS:

http://patchwork.ozlabs.org/patch/259337/


Note that the latter was applied to powerpc -next already.



Thanks Ben and Scott.

Alex,

Should I resend an updated patch after you sync powerpc -next?

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Revert kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()

2013-08-06 Thread tiejun.chen

On 08/07/2013 04:50 AM, Scott Wood wrote:

On Tue, 2013-08-06 at 17:31 +0800, Tiejun Chen wrote:

We should revert this commit to rework.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com


This breaks bisect.


Hmm...

Maybe I can squash these two patches into one.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte

2013-08-02 Thread tiejun.chen

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:

KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v1-v2
  - Use Linux pte for wimge rather than RAM/no-RAM mechanism

  arch/powerpc/include/asm/kvm_host.h |2 +-
  arch/powerpc/kvm/booke.c|2 +-
  arch/powerpc/kvm/e500.h |8 +---
  arch/powerpc/kvm/e500_mmu_host.c|   31 ++-
  4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3328353..583d405 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -535,6 +535,7 @@ struct kvm_vcpu_arch {
  #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
+   pgd_t *pgdir;

u8 io_gpr; /* GPR used as IO source/target */
u8 mmio_is_bigendian;
@@ -592,7 +593,6 @@ struct kvm_vcpu_arch {
struct list_head run_list;
struct task_struct *run_task;
struct kvm_run *kvm_run;
-   pgd_t *pgdir;

spinlock_t vpa_update_lock;
struct kvmppc_vpa vpa;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 17722d8..eb2 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
  #endif

kvmppc_fix_ee_before_entry();
-
+   vcpu-arch.pgdir = current-mm-pgd;
ret = __kvmppc_vcpu_run(kvm_run, vcpu);

/* No need for kvm_guest_exit. It's done in handle_exit.
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 4fd9650..fc4b2f6 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -31,11 +31,13 @@ enum vcpu_ftr {
  #define E500_TLB_NUM   2

  /* entry is mapped somewhere in host TLB */
-#define E500_TLB_VALID (1  0)
+#define E500_TLB_VALID (1  31)
  /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */
-#define E500_TLB_BITMAP(1  1)
+#define E500_TLB_BITMAP(1  30)
  /* TLB1 entry is mapped by host TLB0 */
-#define E500_TLB_TLB0  (1  2)
+#define E500_TLB_TLB0  (1  29)
+/* Lower 5 bits have WIMGE value */
+#define E500_TLB_WIMGE_MASK(0x1f)

  struct tlbe_ref {
pfn_t pfn;  /* valid only for TLB0, except briefly */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..9b10b0b 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,15 +64,6 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
return mas3;
  }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
-{
-#ifdef CONFIG_SMP
-   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2  MAS2_ATTRIB_MASK;
-#endif
-}
-
  /*
   * writing shadow tlb entry to host TLB
   */
@@ -248,10 +239,12 @@ static inline int tlbe_is_writable(struct 
kvm_book3e_206_tlb_entry *tlbe)

  static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
 struct kvm_book3e_206_tlb_entry *gtlbe,
-pfn_t pfn)
+pfn_t pfn, int wimg)
  {
ref-pfn = pfn;
ref-flags |= E500_TLB_VALID;
+   /* Use guest supplied MAS2_G and MAS2_E */
+   ref-flags |= (gtlbe-mas2  MAS2_ATTRIB_MASK) | wimg;

if (tlbe_is_writable(gtlbe))
kvm_set_pfn_dirty(pfn);
@@ -312,8 +305,7 @@ static void kvmppc_e500_setup_stlbe(

/* Force IPROT=0 for all guest mappings. */
stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
-   stlbe-mas2 = (gvaddr  MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe-mas2, pr);
+   stlbe-mas2 = (gvaddr  MAS2_EPN) | (ref-flags  E500_TLB_WIMGE_MASK);
stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);

@@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
unsigned long hva;
int pfnmap = 0;
int tsize = BOOK3E_PAGESZ_4K;
+   pte_t pte;
+   int wimg = 0;

/*
 * Translate guest physical to true physical, acquiring
@@ -437,6 +431,8 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,

if (likely(!pfnmap)) {
unsigned long tsize_pages = 1  (tsize + 10 - PAGE_SHIFT);
+   pgd_t *pgdir;
+
pfn = gfn_to_pfn_memslot(slot, gfn);
if (is_error_noslot_pfn(pfn)) {
printk(KERN_ERR Couldn't get real page for gfn %lx!\n,
@@ -447,9 +443,18 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 

Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread tiejun.chen

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:

KVM need to lookup linux pte for getting TLB attributes (WIMGE).
This is similar to how book3s does.
This will be used in follow-up patches.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v1-v2
  - This is a new change in this version

  arch/powerpc/include/asm/kvm_booke.h |   73 ++
  1 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index d3c1eb3..903624d 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
  {
return vcpu-arch.shared-msr;
  }
+
+/*
+ * Lock and read a linux PTE.  If it's present and writable, atomically
+ * set dirty and referenced bits and return the PTE, otherwise return 0.
+ */
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
+{
+   pte_t pte;
+
+#ifdef PTE_ATOMIC_UPDATES
+   pte_t tmp;
+/* wait until _PAGE_BUSY is clear then set it atomically */
+#ifdef CONFIG_PPC64
+   __asm__ __volatile__ (
+   1:ldarx   %0,0,%3\n
+ andi.   %1,%0,%4\n
+ bne-1b\n
+ ori %1,%0,%4\n
+ stdcx.  %1,0,%3\n
+ bne-1b
+   : =r (pte), =r (tmp), =m (*p)
+   : r (p), i (_PAGE_BUSY)
+   : cc);
+#else
+__asm__ __volatile__ (
+1: lwarx   %0,0,%3\n
+   andi.   %1,%0,%4\n
+   bne-1b\n
+   ori %1,%0,%4\n
+   stwcx.  %1,0,%3\n
+   bne-1b
+: =r (pte), =r (tmp), =m (*p)
+: r (p), i (_PAGE_BUSY)
+: cc);
+#endif
+#else
+   pte = pte_val(*p);
+#endif
+
+   if (pte_present(pte)) {
+   pte = pte_mkyoung(pte);
+   if (writing  pte_write(pte))
+   pte = pte_mkdirty(pte);
+   }
+
+   *p = pte;   /* clears _PAGE_BUSY */
+
+   return pte;
+}
+
+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+ int writing, unsigned long *pte_sizep)


Looks this function is as same as book3s, so why not improve that as common :)

Tiejun


+{
+   pte_t *ptep;
+   unsigned long ps = *pte_sizep;
+   unsigned int shift;
+
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
+   if (!ptep)
+   return __pte(0);
+   if (shift)
+   *pte_sizep = 1ul  shift;
+   else
+   *pte_sizep = PAGE_SIZE;
+
+   if (ps  *pte_sizep)
+   return __pte(0);
+   if (!pte_present(*ptep))
+   return __pte(0);
+
+   return kvmppc_read_update_linux_pte(ptep, writing);
+}
+
  #endif /* __ASM_KVM_BOOKE_H__ */



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6 v2] kvm: powerpc: allow guest control G attribute in mas2

2013-08-02 Thread tiejun.chen

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:

G bit in MAS2 indicates whether the page is Guarded.
There is no reason to stop guest setting  E, so allow him.


Could we merge patch 2 and 3 into only one.

Tiejun



Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v1-v2
  - no change

  arch/powerpc/kvm/e500.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 277cb18..4fd9650 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct 
kvm_vcpu *vcpu)
  #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
  #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
  #define MAS2_ATTRIB_MASK \
- (MAS2_X0 | MAS2_X1 | MAS2_E)
+ (MAS2_X0 | MAS2_X1 | MAS2_E | MAS2_G)
  #define MAS3_ATTRIB_MASK \
  (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
   | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-23 Thread tiejun.chen

On 07/18/2013 06:27 PM, Alexander Graf wrote:


On 18.07.2013, at 12:19, “tiejun.chen” wrote:


On 07/18/2013 06:12 PM, Alexander Graf wrote:


On 18.07.2013, at 12:08, “tiejun.chen” wrote:


On 07/18/2013 05:48 PM, Alexander Graf wrote:


On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:





-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '�tiejun.chen�'
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen�
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
}

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns false when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

  KVM: direct mmio pfn check

  Userspace may specify memory slots that are backed by mmio
pages rather than
  normal RAM.  In some cases it is not enough to identify these
mmio

pages

  by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those some cases and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


It certainly does more than we need and potentially slows down the fast path (RAM 
mapping). The only thing it does on top of if (pfn_valid()) is to check for 
pages that are declared reserved on the host. This happens in 2 cases:

   1) Non cache coherent DMA
   2) Memory hot remove

The non coherent DMA case would be interesting, as with the mechanism as it is 
in place in Linux today, we could potentially break normal guest operation if 
we don't take it into account. However, it's Kconfig guarded by:

 depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 default n if PPC_47x
 default y

so we never hit it with any core we care about ;).

Memory hot remove does not exist on e500 FWIW, so we don't have to worry about 
that one either.


Thanks for this good information :)

So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside 
kvm_is_mmio_pfn() to make sure that check is only valid when that is really 
needed? This can decrease those unnecessary performance loss.

If I'm wrong please correct me :)


You're perfectly right, but this is generic KVM code. So it gets run across all

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
  arch/powerpc/kvm/e500_mmu_host.c |   17 -
  1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
return mas3;
  }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
  {
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?

Tiejun


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
  #ifdef CONFIG_SMP
-   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2  MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
  #endif
+   }
+   return mas2_attr;
  }

  /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe-mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: “tiejun.chen” [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's normal
DDR and the mapping sets M bit (coherent, cacheable) else this is
treated as I/O and we set  I + G  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
   arch/powerpc/kvm/e500_mmu_host.c |   17 -
   1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
   }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
   {
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is that it returns 
false when the page is managed by kernel and is not marked as RESERVED (for 
some reason). For us it does not matter whether the page is reserved or not, if it is 
kernel visible page then it is DDR.



I think you are setting I|G by addressing all mmio pages, right? If so,

KVM: direct mmio pfn check

Userspace may specify memory slots that are backed by mmio pages rather than
normal RAM.  In some cases it is not enough to identify these mmio pages
by pfn_valid().  This patch adds checking the PageReserved as well.

Tiejun


-Bharat



Tiejun


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
   #ifdef CONFIG_SMP
-   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2  MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
   #endif
+   }
+   return mas2_attr;
   }

   /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe-mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);








--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
Behalf Of “tiejun.chen”
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: “tiejun.chen” [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's normal
DDR and the mapping sets M bit (coherent, cacheable) else this is
treated as I/O and we set  I + G  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
mas3, int

usermode)

return mas3;
}

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is that it

returns false when the page is managed by kernel and is not marked as RESERVED
(for some reason). For us it does not matter whether the page is reserved or
not, if it is kernel visible page then it is DDR.




I think you are setting I|G by addressing all mmio pages, right? If so,

  KVM: direct mmio pfn check

  Userspace may specify memory slots that are backed by mmio pages rather
than
  normal RAM.  In some cases it is not enough to identify these mmio pages
  by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those some cases and how checking PageReserved helps in 
those cases?


No, myself didn't see these actual cases in qemu,too. But this should be 
chronically persistent as I understand ;-)


Tiejun



-Bharat



Tiejun


-Bharat



Tiejun


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
#ifdef CONFIG_SMP
-   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2  MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
#endif
+   }
+   return mas2_attr;
}

/*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe-mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);








--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body
of a message to majord...@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
  arch/powerpc/kvm/e500_mmu_host.c |   17 -
  1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
return mas3;
  }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
  {
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {
+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
  #ifdef CONFIG_SMP
-   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2  MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
  #endif
+   }


Additionally, in UP case this little chunk of code is equivalent to

if (1) {
mas2_attr |= MAS2_I | MAS2_G;
} else {
}

So you'd better wrapper MAS2_m in advance like,

#ifdef CONFIG_SMP
#define M_IF_SMPMAS2_M
#else
#define M_IF_SMP0
#endif

Then
if (1)
mas2_attr |= MAS2_I | MAS2_G;
else
mas2_attr |= M_IF_SMP;

Tiejun


+   return mas2_attr;
  }

  /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe-mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '“tiejun.chen”'
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: “tiejun.chen” [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of “tiejun.chen”
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: “tiejun.chen” [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns false when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

   KVM: direct mmio pfn check

   Userspace may specify memory slots that are backed by mmio
pages rather than
   normal RAM.  In some cases it is not enough to identify these
mmio

pages

   by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those some cases and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


Furthermore, how to distinguish we're creating TLB entry for the device assigned 
directly to the GS?


I think its unnecessary to always check if that is mmio's pfn since we have more 
non direct assigned devices.


So maybe we can introduce another helper to fixup that TLB entry in instead of 
this path.


Tiejun



-Bharat


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
 #ifdef CONFIG_SMP
-   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2  MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
 #endif
+   }
+   return mas2_attr;
 }

 /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe-mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);








--
To unsubscribe from this list: send the line unsubscribe kvm-ppc
in the body of a message to majord...@vger.kernel.org More
majordomo info at http://vger.kernel.org/majordomo-info.html








--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 05:44 PM, Alexander Graf wrote:


On 18.07.2013, at 10:55, �tiejun.chen� wrote:


On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '�tiejun.chen�'
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen�
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns false when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

   KVM: direct mmio pfn check

   Userspace may specify memory slots that are backed by mmio
pages rather than
   normal RAM.  In some cases it is not enough to identify these
mmio

pages

   by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those some cases and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


Furthermore, how to distinguish we're creating TLB entry for the device 
assigned directly to the GS?


Because other devices wouldn't be available to the guest through memory slots.


Yes.




I think its unnecessary to always check if that is mmio's pfn since we have 
more non direct assigned devices.


I'm not sure I understand. The shadow TLB code only knows here is a host virtual 
address. It needs to figure out whether the host physical address behind that is 
RAM (can access with cache enabled) or not (has to disable cache)



Sorry, looks I'm misleading you :-P


So maybe we can introduce another helper to fixup that TLB entry in instead of 
this path.


This path does fix up the shadow (host) TLB entry :).



I just mean whether we can have a separate path dedicated to those direct 
assigned devices, not go this common path :)


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 06:00 PM, Alexander Graf wrote:


On 18.07.2013, at 11:56, “tiejun.chen” wrote:


On 07/18/2013 05:44 PM, Alexander Graf wrote:


On 18.07.2013, at 10:55, �tiejun.chen� wrote:


On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '�tiejun.chen�'
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen�
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns false when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

   KVM: direct mmio pfn check

   Userspace may specify memory slots that are backed by mmio
pages rather than
   normal RAM.  In some cases it is not enough to identify these
mmio

pages

   by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those some cases and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


Furthermore, how to distinguish we're creating TLB entry for the device 
assigned directly to the GS?


Because other devices wouldn't be available to the guest through memory slots.


Yes.




I think its unnecessary to always check if that is mmio's pfn since we have 
more non direct assigned devices.


I'm not sure I understand. The shadow TLB code only knows here is a host virtual 
address. It needs to figure out whether the host physical address behind that is 
RAM (can access with cache enabled) or not (has to disable cache)



Sorry, looks I'm misleading you :-P


So maybe we can introduce another helper to fixup that TLB entry in instead of 
this path.


This path does fix up the shadow (host) TLB entry :).



I just mean whether we can have a separate path dedicated to those direct 
assigned devices, not go this common path :)


I don't think it's possible to have a separate path without a certain level of 
trust. In the current flow we don't trust anyone. We just check every 
translated page whether we should enable caching or not.

We could take that information from 2 other side though:

   1) Memory Slot
   2) Guest TLB Flags

If we take it from the memory

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread tiejun.chen

On 07/18/2013 06:12 PM, Alexander Graf wrote:


On 18.07.2013, at 12:08, “tiejun.chen” wrote:


On 07/18/2013 05:48 PM, Alexander Graf wrote:


On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:





-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '�tiejun.chen�'
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen�
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets M bit (coherent, cacheable)
else this is treated as I/O and we set  I + G  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
}

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+   u32 mas2_attr;
+
+   mas2_attr = mas2  MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns false when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

  KVM: direct mmio pfn check

  Userspace may specify memory slots that are backed by mmio
pages rather than
  normal RAM.  In some cases it is not enough to identify these
mmio

pages

  by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those some cases and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


It certainly does more than we need and potentially slows down the fast path (RAM 
mapping). The only thing it does on top of if (pfn_valid()) is to check for 
pages that are declared reserved on the host. This happens in 2 cases:

   1) Non cache coherent DMA
   2) Memory hot remove

The non coherent DMA case would be interesting, as with the mechanism as it is 
in place in Linux today, we could potentially break normal guest operation if 
we don't take it into account. However, it's Kconfig guarded by:

 depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 default n if PPC_47x
 default y

so we never hit it with any core we care about ;).

Memory hot remove does not exist on e500 FWIW, so we don't have to worry about 
that one either.


Thanks for this good information :)

So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside 
kvm_is_mmio_pfn() to make sure that check is only valid when that is really 
needed? This can decrease those unnecessary performance loss.

If I'm wrong please correct me :)


You're perfectly right, but this is generic KVM code. So it gets run across all 
architectures. What if someone has the great idea to add a new case here for 
x86, but doesn't tell us

Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-14 Thread tiejun.chen

On 07/13/2013 07:05 AM, Benjamin Herrenschmidt wrote:

On Fri, 2013-07-12 at 12:50 -0500, Scott Wood wrote:


[1] SOFT_DISABLE_INTS seems an odd name for something that updates the
software state to be consistent with interrupts being *hard* disabled.
I can sort of see the logic in it, but it's confusing when first
encountered.  From the name it looks like all it would do is set
soft_enabled to 1.


It's indeed odd. Also worse when we use DISABLE_INTS which is just a
macro on top of SOFT_DISABLE_INTS :-)

I've been wanting to change the macro name for a while now and never
got to it. Patch welcome :-)



What about SOFT_IRQ_DISABLE? This is close to name hard_irq_disable() :) And 
then remove all DISABLE_INTS as well?


Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-14 Thread tiejun.chen

On 07/13/2013 01:50 AM, Scott Wood wrote:

On 07/11/2013 10:22:28 PM, tiejun.chen wrote:

If so, why not to remove directly hard_irq_disable() inside
kvmppc_handle_exit() by reverting that commit, kvm/ppc/booke64: Fix lazy ee
handling in kvmppc_handle_exit()?

Then we can use SOFT_DISABLE_INTS() explicitly before call
kvmppc_handle_exit() like this:

KVM: PPC: Book3E HV: call SOFT_DISABLE_INTS to sync the software state

We enter with interrupts disabled in hardware, but we need to
call SOFT_DISABLE_INTS anyway to ensure that the software state
is kept in sync.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..b521d21 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -33,6 +33,8 @@

 #ifdef CONFIG_64BIT
 #include asm/exception-64e.h
+#include asm/hw_irq.h
+#include asm/irqflags.h
 #else
 #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */
 #endif
@@ -469,6 +471,14 @@ _GLOBAL(kvmppc_resume_host)
PPC_LL  r3, HOST_RUN(r1)
mr  r5, r14 /* intno */
mr  r14, r4 /* Save vcpu pointer. */
+#ifdef CONFIG_64BIT
+   /*
+* We enter with interrupts disabled in hardware, but
+* we need to call SOFT_DISABLE_INTS anyway to ensure
+* that the software state is kept in sync.
+*/
+   SOFT_DISABLE_INTS(r7,r8)
+#endif

bl  kvmppc_handle_exit


This will clobber the arguments we want to pass to kvmppc_handle_exit.  That can
be fixed by moving SOFT_DISABLE_INTS[1] earlier, and maybe it's more idiomatic


Okay. Once we have a final name to replace SOFT_DISABLE_INTS, I can regenerate 
this as you comment.



to use SOFT_DISABLE_INTS rather than what we currently do, but we still want to
fix hard_irq_disable().  There are other places where we call hard_irq_disable()
where interrupts (and I believe preemption) were previously enabled.


Yes, I had a preliminary change ACKed by Ben, and I guess you also saw :) so 
I'll send that firstly.


Thanks,

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-14 Thread tiejun.chen

On 07/14/2013 12:13 PM, Benjamin Herrenschmidt wrote:

On Fri, 2013-07-12 at 12:54 +0800, tiejun.chen wrote:

Is the following fine?

powerpc: to access local paca after hard irq disabled

We can access paca directly after hard interrupt disabled, and
this can avoid accessing wrong paca when using get_paca() in
preempt case.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com


Ack. We still have an unresolved problem where gcc decides to copy r13
to another register and then index from that, or even store and reload
it, and this possibly accross preempt sections.

It's unclear to me in what circumstances it will do it and whether
there's a case of us getting completely screwed over, I need to
investigate. This is the reason why we originally made the accesses to
soft_enabled be inline asm.


Understood.



We might need to do a bulk conversion of all PACA accesses to either
such inline asm or hide r13 behind asm (forcing essentially a copy
to another register on each use) or a combination of both.

IE. inline asm for direct access of things like soft_enabled, and a
get_paca/put_paca style interface that copies r13 and includes a
preempt_disable/enable for the rest.



I'd like to check this possibility later.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-14 Thread tiejun.chen

On 07/15/2013 10:47 AM, Benjamin Herrenschmidt wrote:

On Mon, 2013-07-15 at 10:20 +0800, tiejun.chen wrote:

What about SOFT_IRQ_DISABLE? This is close to name
hard_irq_disable() :) And
then remove all DISABLE_INTS as well?


Or RECONCILE_IRQ_STATE...


But sounds this doesn't imply this key point that the soft-irq is always 
disabled here :)


And as I understand, the irq state is always needed to be reconciled when we 
disable soft irq, right?


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-11 Thread tiejun.chen

On 07/12/2013 08:19 AM, Benjamin Herrenschmidt wrote:

On Thu, 2013-07-11 at 15:07 +0200, Alexander Graf wrote:

Ok, let me quickly explain the problem.

We are leaving host context, switching slowly into guest context.
During that transition we call get_paca() indirectly (apparently by
another call to hard_disable() which sounds bogus, but that's another
story).

get_paca() warns when we're preemptible. We're only not preemptible
when either preempt is disabled or irqs are disabled. Irqs are
disabled, but arch_irqs_disabled() doesn't know, because it only
checks for soft disabled IRQs.

So we can fix this either by setting IRQs as soft disabled as well or
by disabling preemption until we enter the guest for real. Any
preferences?


Well, if you hard disable first (ie, direct transition from full enabled
to hard disabled), you know you have nothing lazy pending in
irq_pending, then it's ok to mess around with local_paca-soft_enabled
to make it look disabled.

IE. Call hard_irq_disable(), then only turn local_paca-soft_enabled
back on late in the process, some time before the final rfi(d).

That works as long as you had a transition from full enabled to full
disabled and don't hard re-enable in the process. IE, You are certain
that there is nothing pending in irq_happened.

HOWEVER !

If you do that, you *ALSO* need to clear irq_happened. You must *NEVER*
leave PACA_IRQ_HARD_DIS set in irq_happened if you are soft-enabled, and
since the above means that you *will* be seen as soft enabled on the way
out of the guest, you can kaboom.

BTW. I'm fine with a patch that does:

#define hard_irq_disable()  do {\
u8 _was_enabled = get_paca()-soft_enabled;  \


Current problem I met is issued from the above line.


__hard_irq_disable();   \
-   get_paca()-soft_enabled = 0;\


Not here.

If I'm misunderstanding what you guys means, please correct me since this is a 
long discussion thread. I have to reread that carefully.


Tiejun


+   local_paca-soft_enabled = 0;\

In fact we should probably do it anyway.

Cheers,
Ben.






--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-11 Thread tiejun.chen

On 07/12/2013 12:36 AM, Scott Wood wrote:

On 07/11/2013 11:30:41 AM, Alexander Graf wrote:


On 11.07.2013, at 18:18, Scott Wood wrote:

 On 07/11/2013 08:07:30 AM, Alexander Graf wrote:
 get_paca() warns when we're preemptible. We're only not preemptible when
either preempt is disabled or irqs are disabled. Irqs are disabled, but
arch_irqs_disabled() doesn't know, because it only checks for soft disabled 
IRQs.
 So we can fix this either by setting IRQs as soft disabled as well

 If we set IRQs as soft-disabled prior to calling hard_irq_disable(), then
hard_irq_disable() will fail to call trace_hardirqs_off().

Right...


Plus we'd have the same problem trying to set soft_enabled to 0.


 Any preferences?

 Use arch_local_save_flags() in hard_irq_disable() instead of reading
soft_enabled with C code.

That only operates on the soft_enabled bit. We also need to access irq_happened.


OK, so we'll need more inline asm.


If so, why not to remove directly hard_irq_disable() inside kvmppc_handle_exit() 
by reverting that commit, kvm/ppc/booke64: Fix lazy ee handling in 
kvmppc_handle_exit()?


Then we can use SOFT_DISABLE_INTS() explicitly before call kvmppc_handle_exit() 
like this:


KVM: PPC: Book3E HV: call SOFT_DISABLE_INTS to sync the software state

We enter with interrupts disabled in hardware, but we need to
call SOFT_DISABLE_INTS anyway to ensure that the software state
is kept in sync.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S

index e8ed7d6..b521d21 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -33,6 +33,8 @@

 #ifdef CONFIG_64BIT
 #include asm/exception-64e.h
+#include asm/hw_irq.h
+#include asm/irqflags.h
 #else
 #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */
 #endif
@@ -469,6 +471,14 @@ _GLOBAL(kvmppc_resume_host)
PPC_LL  r3, HOST_RUN(r1)
mr  r5, r14 /* intno */
mr  r14, r4 /* Save vcpu pointer. */
+#ifdef CONFIG_64BIT
+   /*
+* We enter with interrupts disabled in hardware, but
+* we need to call SOFT_DISABLE_INTS anyway to ensure
+* that the software state is kept in sync.
+*/
+   SOFT_DISABLE_INTS(r7,r8)
+#endif
bl  kvmppc_handle_exit

/* Restore vcpu pointer and the nonvolatiles we used. */


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-11 Thread tiejun.chen

On 07/12/2013 11:57 AM, Benjamin Herrenschmidt wrote:

On Fri, 2013-07-12 at 10:13 +0800, tiejun.chen wrote:

#define hard_irq_disable()do {\
   u8 _was_enabled = get_paca()-soft_enabled; \


Current problem I met is issued from the above line.


   __hard_irq_disable();   \
- get_paca()-soft_enabled = 0;   \


Not here.

If I'm misunderstanding what you guys means, please correct me since this is a
long discussion thread. I have to reread that carefully.


Then make it
u8 _was_enabled;
__hard_irq_disable();
was_enabled = local_paca-

Once you have hard disabled, using local_paca directly *should* be safe
(minus that gcc problem I mentioned).


Is the following fine?

powerpc: to access local paca after hard irq disabled

We can access paca directly after hard interrupt disabled, and
this can avoid accessing wrong paca when using get_paca() in
preempt case.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
 arch/powerpc/include/asm/hw_irq.h |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index ba713f1..10be1dd 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -96,10 +96,11 @@ static inline bool arch_irqs_disabled(void)
 #endif

 #define hard_irq_disable() do {\
-   u8 _was_enabled = get_paca()-soft_enabled; \
+   u8 _was_enabled;\
__hard_irq_disable();   \
-   get_paca()-soft_enabled = 0;   \
-   get_paca()-irq_happened |= PACA_IRQ_HARD_DIS;  \
+   _was_enabled = local_paca-soft_enabled;\
+   local_paca-soft_enabled = 0;   \
+   local_paca-irq_happened |= PACA_IRQ_HARD_DIS;  \
if (_was_enabled)   \
trace_hardirqs_off();   \
 } while(0)
--
1.7.9.5


Or what about that change to call SOFT_DISABLE_INTS only in KVM scenario? Which 
better?


Then I can send to review?

Thanks,

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] kvm/ppc: fixes for 3.10

2013-07-10 Thread tiejun.chen

On 07/10/2013 05:33 AM, Alexander Graf wrote:


On 09.07.2013, at 11:09, tiejun.chen wrote:


On 07/09/2013 04:53 PM, Alexander Graf wrote:



Am 09.07.2013 um 09:21 schrieb tiejun.chen tiejun.c...@windriver.com:


On 06/07/2013 08:16 AM, Scott Wood wrote:

Most of these have been posted before, but I grouped them together as
there are some contextual dependencies between them.

Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these
if there's no objection over the next few days?


Scott and Alex,

Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as 
well?


Which commits exactly are you missing there?


Alex,

Four patches,

kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
kvm/ppc/booke: Hold srcu lock when calling gfn functions
kvm/ppc/booke64: Disable e6500 support  

are already in Linus's tree,

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=authorq=Scott+Wood

So should we sync these commits into your tree?


They should already be in there?


Yes, I already see this.

Thanks,

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread tiejun.chen

On 07/10/2013 05:49 PM, Alexander Graf wrote:


On 10.07.2013, at 08:02, Tiejun Chen wrote:


We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this case.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
arch/powerpc/kvm/booke.c |2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index dcc94f0..9dae25d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
WARN_ON(local_paca-irq_happened != 0);
#endif

+   preempt_disable();
/*
 * We enter with interrupts disabled in hardware, but
 * we need to call hard_irq_disable anyway to ensure that
@@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,

/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);
+   preempt_enable();


All of the code here is already called with interrupts disabled. I don't see 
how we could preempt then?


But the kernel still check this in preempt case,

#define get_paca()  ((void) debug_smp_processor_id(), local_paca)

then trigger that known call trace as I observed :)

BUG: using smp_processor_id() in preemptible [] code: 
qemu-system-ppc/2065
caller is .kvmppc_handle_exit+0x48/0x810
CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty 
#116
Call Trace:
[c001fc637570] [c000835c] .show_stack+0x7c/0x1f0 (unreliable)
[c001fc637640] [c0673a0c] .dump_stack+0x28/0x3c
[c001fc6376b0] [c02f04d8] .debug_smp_processor_id+0x108/0x120
[c001fc637740] [c00444e8] .kvmppc_handle_exit+0x48/0x810
[c001fc6377f0] [c0047c80] .kvmppc_resume_host+0xa4/0xf8

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread tiejun.chen

On 07/11/2013 03:15 AM, Scott Wood wrote:

On 07/10/2013 01:02:19 AM, Tiejun Chen wrote:

We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this case.


The soft-ee mechanism depends on accessing the PACA directly via r13 to avoid
this.  We probably should be using inline asm to read was_enabled rather than


Yes.


hoping the compiler doesn't do anything silly.


Do you recommend I should directly replace get_paca() with local_paca inside 
hard_irq_disable()?


#define hard_irq_disable()  do {\
u8 _was_enabled = get_paca()-soft_enabled; \

-   u8 _was_enabled = local_paca-soft_enabled;

But is this safe for all scenarios?

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] kvm/ppc: fixes for 3.10

2013-07-09 Thread tiejun.chen

On 06/07/2013 08:16 AM, Scott Wood wrote:

Most of these have been posted before, but I grouped them together as
there are some contextual dependencies between them.

Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these
if there's no objection over the next few days?


Scott and Alex,

Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as 
well?

Or instead, where can we proceed to book3e KVM?

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] kvm/ppc: fixes for 3.10

2013-07-09 Thread tiejun.chen

On 07/09/2013 04:53 PM, Alexander Graf wrote:



Am 09.07.2013 um 09:21 schrieb tiejun.chen tiejun.c...@windriver.com:


On 06/07/2013 08:16 AM, Scott Wood wrote:

Most of these have been posted before, but I grouped them together as
there are some contextual dependencies between them.

Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these
if there's no objection over the next few days?


Scott and Alex,

Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as 
well?


Which commits exactly are you missing there?


Alex,

Four patches,

kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
kvm/ppc/booke: Hold srcu lock when calling gfn functions
kvm/ppc/booke64: Disable e6500 support  

are already in Linus's tree,

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=authorq=Scott+Wood

So should we sync these commits into your tree?

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction

2013-06-26 Thread tiejun.chen

On 06/26/2013 01:42 PM, Bharat Bhushan wrote:

ehpriv instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with run-debug have relevant information.

As this is the first point we are using run-debug, also defined
the run-debug structure.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
  arch/powerpc/include/asm/disassemble.h |4 
  arch/powerpc/include/uapi/asm/kvm.h|   21 +
  arch/powerpc/kvm/e500_emulate.c|   27 +++
  3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h 
b/arch/powerpc/include/asm/disassemble.h
index 9b198d1..856f8de 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst)
return inst  0x;
  }

+static inline unsigned int get_oc(u32 inst)
+{
+   return (inst  11)  0x7fff;
+}
  #endif /* __ASM_PPC_DISASSEMBLE_H__ */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..ded0607 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -269,7 +269,24 @@ struct kvm_fpu {
__u64 fpr[32];
  };

+/*
+ * Defines for h/w breakpoint, watchpoint (read, write or both) and
+ * software breakpoint.
+ * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
+ * for KVM_DEBUG_EXIT.
+ */
+#define KVMPPC_DEBUG_NONE  0x0
+#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
+#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
+#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  struct kvm_debug_exit_arch {
+   __u64 address;
+   /*
+* exiting to userspace because of h/w breakpoint, watchpoint
+* (read, write or both) and software breakpoint.
+*/
+   __u32 status;
+   __u32 reserved;
  };

  /* for KVM_SET_GUEST_DEBUG */
@@ -281,10 +298,6 @@ struct kvm_guest_debug_arch {
 * Type denotes h/w breakpoint, read watchpoint, write
 * watchpoint or watchpoint (both read and write).
 */
-#define KVMPPC_DEBUG_NONE  0x0
-#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
-#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
-#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
__u32 type;
__u32 reserved;
} bp[16];
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index b10a012..dab9d07 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -26,6 +26,8 @@
  #define XOP_TLBRE   946
  #define XOP_TLBWE   978
  #define XOP_TLBILX  18
+#define XOP_EHPRIV  270
+#define EHPRIV_OC_DEBUG 0


As I think the case, OC = 0, is a bit specific since IIRC, if the OC
operand is omitted, its equal 0 by default. So I think we should start this OC 
value from 1 or other magic number.


And if possible, we'd better add some comments to describe this to make the OC 
definition readable.


Tiejun



  #ifdef CONFIG_KVM_E500MC
  static int dbell2prio(ulong param)
@@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, 
int rb)
  }
  #endif

+static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu,
+  unsigned int inst, int *advance)
+{
+   int emulated = EMULATE_DONE;
+
+   switch (get_oc(inst)) {
+   case EHPRIV_OC_DEBUG:
+   run-exit_reason = KVM_EXIT_DEBUG;
+   run-debug.arch.address = vcpu-arch.pc;
+   run-debug.arch.status = 0;
+   kvmppc_account_exit(vcpu, DEBUG_EXITS);
+   emulated = EMULATE_EXIT_USER;
+   *advance = 0;
+   break;
+   default:
+   emulated = EMULATE_FAIL;
+   }
+   return emulated;
+}
+
  int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 unsigned int inst, int *advance)
  {
@@ -130,6 +152,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
emulated = kvmppc_e500_emul_tlbivax(vcpu, ea);
break;

+   case XOP_EHPRIV:
+   emulated = kvmppc_e500_emul_ehpriv(run, vcpu, inst,
+  advance);
+   break;
+
default:
emulated = EMULATE_FAIL;
}



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction

2013-06-26 Thread tiejun.chen

On 06/26/2013 04:44 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Wednesday, June 26, 2013 12:25 PM
To: Bhushan Bharat-R65777
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421; b...@kernel.crashing.org; linuxppc-...@lists.ozlabs.org; linux-
ker...@vger.kernel.org; mi...@neuling.org; Bhushan Bharat-R65777
Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction

On 06/26/2013 01:42 PM, Bharat Bhushan wrote:

ehpriv instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with run-debug have relevant information.

As this is the first point we are using run-debug, also defined
the run-debug structure.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
   arch/powerpc/include/asm/disassemble.h |4 
   arch/powerpc/include/uapi/asm/kvm.h|   21 +
   arch/powerpc/kvm/e500_emulate.c|   27 +++
   3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h

b/arch/powerpc/include/asm/disassemble.h

index 9b198d1..856f8de 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst)
return inst  0x;
   }

+static inline unsigned int get_oc(u32 inst)
+{
+   return (inst  11)  0x7fff;
+}
   #endif /* __ASM_PPC_DISASSEMBLE_H__ */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h

b/arch/powerpc/include/uapi/asm/kvm.h

index 0fb1a6e..ded0607 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -269,7 +269,24 @@ struct kvm_fpu {
__u64 fpr[32];
   };

+/*
+ * Defines for h/w breakpoint, watchpoint (read, write or both) and
+ * software breakpoint.
+ * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
+ * for KVM_DEBUG_EXIT.
+ */
+#define KVMPPC_DEBUG_NONE  0x0
+#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
+#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
+#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
   struct kvm_debug_exit_arch {
+   __u64 address;
+   /*
+* exiting to userspace because of h/w breakpoint, watchpoint
+* (read, write or both) and software breakpoint.
+*/
+   __u32 status;
+   __u32 reserved;
   };

   /* for KVM_SET_GUEST_DEBUG */
@@ -281,10 +298,6 @@ struct kvm_guest_debug_arch {
 * Type denotes h/w breakpoint, read watchpoint, write
 * watchpoint or watchpoint (both read and write).
 */
-#define KVMPPC_DEBUG_NONE  0x0
-#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
-#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
-#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
__u32 type;
__u32 reserved;
} bp[16];
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index b10a012..dab9d07 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -26,6 +26,8 @@
   #define XOP_TLBRE   946
   #define XOP_TLBWE   978
   #define XOP_TLBILX  18
+#define XOP_EHPRIV  270
+#define EHPRIV_OC_DEBUG 0


As I think the case, OC = 0, is a bit specific since IIRC, if the OC
operand is omitted, its equal 0 by default. So I think we should start this OC
value from 1 or other magic number.


ehpriv instruction is defined to be used as:
ehpriv OC // where OC can be 0,1, ... n
and in extended for it can be used as
ehpriv // With no OC, and here it assumes OC = 0
So OC = 0 is not specific but ehpriv is same as ehpriv 0.


Yes, this is just what I mean.



I do not think of any special reason to reserve ehpriv and ehpriv 0.


So I still prefer we can reserve the 'ehpriv' without OC operand as one simple 
approach to test or develop something for KVM quickly because its really 
convenient to trap into the hypervisor only with one 'ehpriv' instruction easily.


But I have no further objection if you guys are fine to this ;-)

Tiejun



Thanks
-Bharat



And if possible, we'd better add some comments to describe this to make the OC
definition readable.

Tiejun



   #ifdef CONFIG_KVM_E500MC
   static int dbell2prio(ulong param)
@@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu,

int rb)

   }
   #endif

+static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu

*vcpu,

+  unsigned int inst, int *advance)
+{
+   int emulated = EMULATE_DONE;
+
+   switch (get_oc(inst)) {
+   case EHPRIV_OC_DEBUG:
+   run-exit_reason = KVM_EXIT_DEBUG;
+   run-debug.arch.address = vcpu-arch.pc;
+   run-debug.arch.status = 0;
+   kvmppc_account_exit(vcpu, DEBUG_EXITS);
+   emulated

Re: Performance Analysis - EMUL_MTSPR: Increased counts

2013-05-14 Thread tiejun.chen

On 05/13/2013 04:32 PM, nitesh narayan lal wrote:

Hi,
I am doing the virtio performance analysis on PowerPC,with the updated
QEMU I had observed that the EMUL_MTSPR instruction counts in KVM Exit


Sounds you are saying this behaviour is only reproduced after you update QEMU. 
If so, you can use 'git bisect' easily to figure out this issued commit.



count is going really high when I send packets from Guest with respect
to without sending network packets from Guest.
I had added few prints and found that special register number 336
which corresponds to TSR:Timer Status is been called number of times


Which platform? Which CPU core?


which I think could result in a higher CPU Utilization.


How many processes else are running except for KVM guest?

Tiejun


I would be thankful if I could get some opinions about this observation.
Regards
Nitesh Narayan Lal
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest

2013-05-12 Thread tiejun.chen

On 05/11/2013 03:39 AM, Alexander Graf wrote:



Am 10.05.2013 um 21:22 schrieb Scott Wood scottw...@freescale.com:


On 05/10/2013 12:57:33 PM, Alexander Graf wrote:

Could you guys please collect performance data during the next weeks on both 
guest-directed ISIs as well as VF MMIOs (preferably with in-kernel MMIO), so 
that we can decide on the direction that's worth going towards?


Collecting data on VF MMIO would require implementing it (or at least salvaging 
and fixing some old code), which is not a high priority at the moment.  If we 
do implement VF in the future we could always undo the direct ISI change, but 
it would still be nice to know if there's any real benefit in the first place.


Mike sounded like he had an almost working poc, which is good enough to collect 
rough numbers.


Which can the test case be adopted?

Mike,

If you already have a good case for your poc, please share that with me. Then 
I'd like to run that.


Tiejun



And yes, changes like these should always get at least basic performance 
numbers along with them, regardless of drawbacks.


Alex



FWIW, I doubt that the more stress on HW TLB will be significant.

-Scott




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread tiejun.chen

On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Linuxppc-dev [mailto:linuxppc-dev-
bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman
Mihai Claudiu-B02008
Sent: Wednesday, May 08, 2013 6:44 PM
To: Wood Scott-B07421; tiejun.chen
Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts


This only disable soft interrupt for kvmppc_restart_interrupt() that
restarts interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


Those aren't the only exceptions that can end up going to the host.
We could get a TLB miss that results in a heavyweight MMIO exit, etc.


And shouldn't we handle kvmppc_restart_interrupt() like the original
HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
ack)   \

START_EXCEPTION(label); \
 NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
PROLOG_ADDITION_MASKABLE)\
 EXCEPTION_COMMON(trapnum, PACA_EXGEN,
*INTS_DISABLE*) \
...


Could you elaborate on what you mean?


I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL
interrupts. There is special macro masked_interrupt_book3e in those exception
handlers that sets paca-irq_happened.

The list of replied interrupts is limited to asynchronous noncritical interrupts
which can be masked by MSR[EE] (therefore no TLB miss).


Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of 
masked interruts.


Are you saying perfmon? If so, its also in that list:

START_EXCEPTION(perfmon);
NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
PROLOG_ADDITION_NONE)
EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)

Tiejun



-Bharat


Now on KVM book3e we
don't want to put them in the irq_happened lazy state but rather to execute them
directly, so there is no reason for exception handling symmetry between host and
guest.

-Mike


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread tiejun.chen

On 05/09/2013 03:51 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Thursday, May 09, 2013 1:18 PM
To: Bhushan Bharat-R65777
Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Linuxppc-dev [mailto:linuxppc-dev-
bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
bounces+Caraman
Mihai Claudiu-B02008
Sent: Wednesday, May 08, 2013 6:44 PM
To: Wood Scott-B07421; tiejun.chen
Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de;
kvm-ppc@vger.kernel.org; k...@vger.kernel.org
Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
interrupts


This only disable soft interrupt for kvmppc_restart_interrupt()
that restarts interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


Those aren't the only exceptions that can end up going to the host.
We could get a TLB miss that results in a heavyweight MMIO exit, etc.


And shouldn't we handle kvmppc_restart_interrupt() like the
original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
ack)   \

START_EXCEPTION(label); \
  NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
PROLOG_ADDITION_MASKABLE)\
  EXCEPTION_COMMON(trapnum, PACA_EXGEN,
*INTS_DISABLE*) \
...


Could you elaborate on what you mean?


I think Tiejun was saying that host has flags and replays only
EE/DEC/DBELL interrupts. There is special macro
masked_interrupt_book3e in those exception handlers that sets paca-

irq_happened.


The list of replied interrupts is limited to asynchronous noncritical
interrupts which can be masked by MSR[EE] (therefore no TLB miss).


Embedded Perfmon interrupt is also asynchronous, Why that is not in the list

of masked interruts.

Are you saying perfmon? If so, its also in that list:

  START_EXCEPTION(perfmon);
  NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
  PROLOG_ADDITION_NONE)
  EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)


Where it is recorded in paca-irq_happned to be replayed later ?


In stead of PROLOG_ADDITION_MASKABLE, actually we have nothing to do with 
PROLOG_ADDITION_NONE for perfmon, so perfmon can be executed without lazy state.


Tiejun





Tiejun



-Bharat


Now on KVM book3e we
don't want to put them in the irq_happened lazy state but rather to
execute them directly, so there is no reason for exception handling
symmetry between host and guest.

-Mike






--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread tiejun.chen

On 05/09/2013 04:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Kevin Hao [mailto:haoke...@gmail.com]
Sent: Thursday, May 09, 2013 1:38 PM
To: Bhushan Bharat-R65777
Cc: tiejun.chen; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood Scott-
B07421; ag...@suse.de; kvm-ppc@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

On Thu, May 09, 2013 at 07:51:09AM +, Bhushan Bharat-R65777 wrote:




-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Thursday, May 09, 2013 1:18 PM
To: Bhushan Bharat-R65777
Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
interrupts

On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Linuxppc-dev [mailto:linuxppc-dev-
bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf
bounces+Of Caraman
Mihai Claudiu-B02008
Sent: Wednesday, May 08, 2013 6:44 PM
To: Wood Scott-B07421; tiejun.chen
Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de;
kvm-ppc@vger.kernel.org; k...@vger.kernel.org
Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
interrupts


This only disable soft interrupt for kvmppc_restart_interrupt()
that restarts interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


Those aren't the only exceptions that can end up going to the host.
We could get a TLB miss that results in a heavyweight MMIO exit, etc.


And shouldn't we handle kvmppc_restart_interrupt() like the
original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
ack)   \

START_EXCEPTION(label); \
  NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
PROLOG_ADDITION_MASKABLE)\
  EXCEPTION_COMMON(trapnum, PACA_EXGEN,
*INTS_DISABLE*) \
...


Could you elaborate on what you mean?


I think Tiejun was saying that host has flags and replays only
EE/DEC/DBELL interrupts. There is special macro
masked_interrupt_book3e in those exception handlers that sets
paca-

irq_happened.


The list of replied interrupts is limited to asynchronous
noncritical interrupts which can be masked by MSR[EE] (therefore no TLB

miss).


Embedded Perfmon interrupt is also asynchronous, Why that is not
in the list

of masked interruts.

Are you saying perfmon? If so, its also in that list:

  START_EXCEPTION(perfmon);
  NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
  PROLOG_ADDITION_NONE)
  EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)


Where it is recorded in paca-irq_happned to be replayed later ?


Actually we don't want replay the perfmon interrupt later. We would run it even
soft irq is disabled and just treat it as NMI. Please see the following function
quoted from arch/powerpc/perf/core-fsl-emb.c:
   /*
* If interrupts were soft-disabled when a PMU interrupt occurs, treat
* it as an NMI.
*/
   static inline int perf_intr_is_nmi(struct pt_regs *regs)
   {
   #ifdef __powerpc64__
   return !regs-softe;
   #else
   return 0;
   #endif
   }


Is it because that we cannot afford to lose perfmon interrupt for more accurate 
capturing of data ?


powerpc/perf: e500 support

This implements perf_event support for the Freescale embedded performance
monitor, based on the existing perf_event.c that supports server/classic
chips.

Some limitations:
- Performance monitor interrupts are regular EE interrupts, and thus you
  can't profile places with interrupts disabled.  We may want to implement
  soft IRQ-disabling, with perfmon interrupts exempted and treated as NMIs.

Tiejun



-Bharat



Thanks,
Kevin





Tiejun



-Bharat


Now on KVM book3e we
don't want to put them in the irq_happened lazy state but rather
to execute them directly, so there is no reason for exception
handling symmetry between host and guest.

-Mike




___
Linuxppc-dev mailing list
linuxppc-...@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev






--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread tiejun.chen

On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Linuxppc-dev [mailto:linuxppc-dev-
bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman
Mihai Claudiu-B02008
Sent: Wednesday, May 08, 2013 6:44 PM
To: Wood Scott-B07421; tiejun.chen
Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts


This only disable soft interrupt for kvmppc_restart_interrupt() that
restarts interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


Those aren't the only exceptions that can end up going to the host.
We could get a TLB miss that results in a heavyweight MMIO exit, etc.


And shouldn't we handle kvmppc_restart_interrupt() like the original
HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
ack)   \

START_EXCEPTION(label); \
 NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
PROLOG_ADDITION_MASKABLE)\
 EXCEPTION_COMMON(trapnum, PACA_EXGEN,
*INTS_DISABLE*) \
...


Could you elaborate on what you mean?


I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL
interrupts. There is special macro masked_interrupt_book3e in those exception
handlers that sets paca-irq_happened.

The list of replied interrupts is limited to asynchronous noncritical interrupts
which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we
don't want to put them in the irq_happened lazy state but rather to execute them
directly, so there is no reason for exception handling symmetry between host and
guest.



Another Question:

The case is:



Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall.


Case 1)
  - Local_irq_disable()  will set soft_enabled = 0
  - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, 
Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other 
interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a 
device keep on inserting interrupt till the interrupt condition on device is 
cleared, right?


I don't understand the interrupt condition on device is cleared here.

I think regardless if you clear the device interrupt status, the system still 
receive a pending interrupt once EE or GS = 1.



  - local_irq_enable() - This checks that irq_happened is set, and replays


ret_from_except also check to replay.



Now the case 2)
Case 2)
- Local_irq_disable()  will set soft_enabled = 0
  - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not 
clear EE in SRR1 and rfi. So interrupts are not hard disabled.
  - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, 
Also clears EE in SRR1 and rfi. So interrupts are hard disabled.
  - local_irq_enable() - This checks that irq_happened is set.
IIUC, it replays only one interrupt? is not it?


After anyone is replayed in arch_local_irq_restore(), we will set soft/hard 
interrupt there:


set_soft_enabled(1);
__hard_irq_enable();

Then any pending interrupt can be executed now.

Additionally, ret_from_except probably check to replay all.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest

2013-05-09 Thread tiejun.chen

On 05/08/2013 05:28 PM, tiejun.chen wrote:

On 05/08/2013 05:20 PM, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
Behalf Of tiejun.chen
Sent: Wednesday, May 08, 2013 4:54 AM
To: Wood Scott-B07421
Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
linuxppc-...@lists.ozlabs.org
Subject: Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to
Guest

On 05/08/2013 07:40 AM, Scott Wood wrote:

On 05/07/2013 06:06:30 AM, Tiejun Chen wrote:

We also can direct ISI exception to Guest like DSI.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
  arch/powerpc/kvm/booke_emulate.c |3 +++
  arch/powerpc/kvm/e500mc.c|3 ++-
  2 files changed, 5 insertions(+), 1 deletion(-)


Are you seeing a real performance improvement from this?  This will

interfere

No. But after we reduce the exit to host, shouldn't this improve
performance?


We lose some flexibility for this so it make sense only if we gain
measurable improvements.


Sounds we have much more works to do.






somewhat with using the VF bit, if we were to ever do so, since VF only

affects

Sorry, what is the VF you said?


VF stands for virtualization fault see MAS8[VF] and we may use it for 
virtualized


I almost forget this point :)


Looks KVM PPC have no this mechanism currently since I don't find MAS8_VF is 
used in kernel, right?


If I'm missing something please correct me.

Tiejun




MMIO. The hypervisor should deny execute access on pages marked with VF.
Accordingly
in this case guest ISI exceptions should be handled by the hypervisor.




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread tiejun.chen

On 05/09/2013 07:21 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Thursday, May 09, 2013 3:48 PM
To: Bhushan Bharat-R65777
Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

On 05/09/2013 06:00 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Thursday, May 09, 2013 3:15 PM
To: Bhushan Bharat-R65777
Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
interrupts

On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Linuxppc-dev [mailto:linuxppc-dev-
bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
bounces+Caraman
Mihai Claudiu-B02008
Sent: Wednesday, May 08, 2013 6:44 PM
To: Wood Scott-B07421; tiejun.chen
Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de;
kvm-ppc@vger.kernel.org; k...@vger.kernel.org
Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
interrupts


This only disable soft interrupt for kvmppc_restart_interrupt()
that restarts interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


Those aren't the only exceptions that can end up going to the host.
We could get a TLB miss that results in a heavyweight MMIO exit, etc.


And shouldn't we handle kvmppc_restart_interrupt() like the
original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
ack)   \

START_EXCEPTION(label); \
   NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
PROLOG_ADDITION_MASKABLE)\
   EXCEPTION_COMMON(trapnum, PACA_EXGEN,
*INTS_DISABLE*) \
...


Could you elaborate on what you mean?


I think Tiejun was saying that host has flags and replays only
EE/DEC/DBELL interrupts. There is special macro
masked_interrupt_book3e in those exception handlers that sets paca-

irq_happened.


The list of replied interrupts is limited to asynchronous
noncritical interrupts which can be masked by MSR[EE] (therefore no TLB

miss).

Now on KVM book3e we don't want to put them in the irq_happened
lazy state but rather to execute them directly, so there is no
reason for exception handling symmetry between host and guest.



Another Question:

The case is:



Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I

recall.



Case 1)
- Local_irq_disable()  will set soft_enabled = 0
- Now Externel interrupt happens, there we set PACA_IRQ_EE in
irq_happened,

Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No
more other interrupt gated by MSR.EE can happen. Looks like the idea
here is to not let a device keep on inserting interrupt till the
interrupt condition on device is cleared, right?

I don't understand the interrupt condition on device is cleared here.

I think regardless if you clear the device interrupt status, the
system still receive a pending interrupt once EE or GS = 1.


Once yes, but I think to avoid flood of device interrupt we disable MSR.EE

when soft-disabled.

But we neither ACK nor send EOI to that irq in the interrupt controller, so that
should be in pending state.






- local_irq_enable() - This checks that irq_happened is set, and
replays


ret_from_except also check to replay.



Now the case 2)
Case 2)
- Local_irq_disable()  will set soft_enabled = 0
- Now DEC interrupt happens. We set PACA_IRQ_DEC in
irq_happened, But do

not clear EE in SRR1 and rfi. So interrupts are not hard disabled.

- Now say EE interrupt happens, there we set PACA_IRQ_EE in
irq_happened,

Also clears EE in SRR1 and rfi. So interrupts are hard disabled.

- local_irq_enable() - This checks that irq_happened is set.
IIUC, it replays only one interrupt? is not it?


After anyone is replayed in arch_local_irq_restore(), we will set
soft/hard interrupt there:

set_soft_enabled(1);
__hard_irq_enable();

Then any pending interrupt can be executed now.


Do you mean that the interrupt should fire again?


I means the pending exception including external interrupt, the decrementer
exception and the doorbell exception, can trap CPU once EE=1 with
__hard_irq_enable() here. Then the kernel can handle those exception since soft
enable is also 1 now.





Additionally, ret_from_except probably check to replay all.


Local_irq_enable() will not take us to ret_from_except.


Yes. I just say ret_from_except can provide an approach to replay all :)


__replay_interrupt() from arch_local_irq_enable() will take us to 
ret_from_except/lite :)
There all pending interrupts

Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest

2013-05-09 Thread tiejun.chen

On 05/09/2013 07:34 PM, Caraman Mihai Claudiu-B02008 wrote:

VF stands for virtualization fault see MAS8[VF] and we may use it for

virtualized

Looks KVM PPC have no this mechanism currently since I don't find MAS8_VF
is
used in kernel, right?


Yes but 'we may use it' in the feature, I have a functional POC with VF.


Any IO performance to be improved with this POC?


Now we capture virtualized MMIO accesses as TLB misses (we don't write
into HW TLB guest translation outside visible memslots).


Yes.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry

2013-05-09 Thread tiejun.chen

On 05/10/2013 11:34 AM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of
Scott Wood
Sent: Friday, May 10, 2013 8:40 AM
To: Alexander Graf; Benjamin Herrenschmidt
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; 
linuxppc-...@lists.ozlabs.org;
Wood Scott-B07421
Subject: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry

Currently this is only being done on 64-bit.  Rather than just move it
out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
consistent with lazy ee state, and so that we don't track more host
code as interrupts-enabled than necessary.

Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
that this function now has a role on 32-bit as well.

Signed-off-by: Scott Wood scottw...@freescale.com
---
  arch/powerpc/include/asm/kvm_ppc.h |   11 ---
  arch/powerpc/kvm/book3s_pr.c   |4 ++--
  arch/powerpc/kvm/booke.c   |4 ++--
  arch/powerpc/kvm/powerpc.c |2 --
  4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h
b/arch/powerpc/include/asm/kvm_ppc.h
index a5287fe..6885846 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
}
  }

-/* Please call after prepare_to_enter. This function puts the lazy ee state
-   back to normal mode, without actually enabling interrupts. */
-static inline void kvmppc_lazy_ee_enable(void)
+/*
+ * Please call after prepare_to_enter. This function puts the lazy ee and irq
+ * disabled tracking state back to normal mode, without actually enabling
+ * interrupts.
+ */
+static inline void kvmppc_fix_ee_before_entry(void)
  {
+   trace_hardirqs_on();
+
  #ifdef CONFIG_PPC64
/* Only need to enable IRQs by hard enabling them after this */
local_paca-irq_happened = 0;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index bdc40b8..0b97ce4 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -890,7 +890,7 @@ program_interrupt:
local_irq_enable();
r = s;
} else {
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();
}
}

@@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
kvm_vcpu *vcpu)
if (vcpu-arch.shared-msr  MSR_FP)
kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);

-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();

ret = __kvmppc_vcpu_run(kvm_run, vcpu);

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 705fc5c..eb89b83 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu
*vcpu)
ret = s;
goto out;
}
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();


local_irq_disable() is called before kvmppc_prepare_to_enter().


In patch 4, we call hard_irq_disable() once enter kvmppc_prepare_to_enter().

Tiejun


Now we put the irq_happend and soft_enabled back to previous state without 
checking for any interrupt happened in between. If any interrupt happens in 
between, will not that be lost?

-Bharat



kvm_guest_enter();

@@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
kvm_vcpu *vcpu,
local_irq_enable();
r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
} else {
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();
}
}

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6316ee3..4e05f8c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
kvm_guest_exit();
continue;
}
-
-   trace_hardirqs_on();
  #endif

kvm_guest_enter();
--
1.7.10.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



___
Linuxppc-dev mailing list
linuxppc-...@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest

2013-05-08 Thread tiejun.chen

On 05/08/2013 05:20 PM, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
Behalf Of tiejun.chen
Sent: Wednesday, May 08, 2013 4:54 AM
To: Wood Scott-B07421
Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
linuxppc-...@lists.ozlabs.org
Subject: Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to
Guest

On 05/08/2013 07:40 AM, Scott Wood wrote:

On 05/07/2013 06:06:30 AM, Tiejun Chen wrote:

We also can direct ISI exception to Guest like DSI.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
  arch/powerpc/kvm/booke_emulate.c |3 +++
  arch/powerpc/kvm/e500mc.c|3 ++-
  2 files changed, 5 insertions(+), 1 deletion(-)


Are you seeing a real performance improvement from this?  This will

interfere

No. But after we reduce the exit to host, shouldn't this improve
performance?


We lose some flexibility for this so it make sense only if we gain
measurable improvements.


Sounds we have much more works to do.






somewhat with using the VF bit, if we were to ever do so, since VF only

affects

Sorry, what is the VF you said?


VF stands for virtualization fault see MAS8[VF] and we may use it for 
virtualized


I almost forget this point :)


MMIO. The hypervisor should deny execute access on pages marked with VF. 
Accordingly
in this case guest ISI exceptions should be handled by the hypervisor.


Thanks for your information.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: Make hard_irq_disable() do the right thing vs. irq tracing

2013-05-07 Thread tiejun.chen

On 05/07/2013 03:04 PM, Benjamin Herrenschmidt wrote:

If hard_irq_disable() is called while interrupts are already soft-disabled
(which is the most common case) all is already well.

However you can (and in some cases want) to call it while everything is
enabled (to make sure you don't get a lazy even, for example before entry
into KVM guests) and in this case we need to inform the irq tracer that
the irqs are going off.

We have to change the inline into a macro to avoid an include circular
dependency hell hole.

Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---

Tested on pseries, Scott, I don't expect a problem with that patch especially
since most callers already are soft disabled, so I'll merge it now along with
my other pending stuff and you can simplify your KVM one.

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index e45c494..d615b28 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -95,15 +95,13 @@ static inline bool arch_irqs_disabled(void)
  #define __hard_irq_disable()  __mtmsrd(local_paca-kernel_msr, 1)
  #endif

-static inline void hard_irq_disable(void)
-{
-   __hard_irq_disable();
-   get_paca()-soft_enabled = 0;
-   get_paca()-irq_happened |= PACA_IRQ_HARD_DIS;
-}
-
-/* include/linux/interrupt.h needs hard_irq_disable to be a macro */
-#define hard_irq_disable   hard_irq_disable
+#define hard_irq_disable() do {\
+   __hard_irq_disable();   \
+   if (local_paca-soft_enabled)\
+   trace_hardirqs_off();   \
+   get_paca()-soft_enabled = 0;\


Could we simplify this as follows:

+#define hard_irq_disable() do {\
+   __hard_irq_disable();   \
+   if (get_paca()-soft_enabled) {  \
+   trace_hardirqs_off();   \
+   get_paca()-soft_enabled = 0;\
+   }   \
+   get_paca()-irq_happened |= PACA_IRQ_HARD_DIS;   \
+} while(0)

Tiejun


+   get_paca()-irq_happened |= PACA_IRQ_HARD_DIS;   \
+} while(0)

  static inline bool lazy_irq_pending(void)
  {




--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with E500MC

2013-05-07 Thread tiejun.chen

On 05/07/2013 09:33 AM, tiejun.chen wrote:

On 05/06/2013 10:58 PM, Alexander Graf wrote:

On 05/06/2013 04:53 AM, Tiejun Chen wrote:

Actually E500MC also support doorbell exception, and CONFIG_PPC_E500MC
can cover BOOK3E/BOOK3E_64 as well.

Signed-off-by: Tiejun Chentiejun.c...@windriver.com
---
  arch/powerpc/kvm/booke.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..dc1f590 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
  kvmppc_fill_pt_regs(regs);
  timer_interrupt(regs);
  break;
-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64)
+#if defined(CONFIG_PPC_E500MC)


I suppose you mean CONFIG_KVM_E500MC here? Why didn't this work for you before?


This works for me.

Here I just mean currently CONFIG_PPC_E500MC is always selected no matter what
CONFIG_PPC_FSL_BOOK3E or CONFIG_PPC_BOOK3E_64 is enabled. And especially, this
is already in the arch/powerpc/kvm/booke.c file, so I think one #ifdef
(CONFIG_PPC_E500MC) is enough and also makes sense.


The ifdef above should cover the same range of CPUs.


Or furthermore, the #ifdef CONFIG_PPC_DOORBELL is reasonable to cover this.



I think this may be better so I send next version with this. So please take a 
look at.


Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with E500MC

2013-05-06 Thread tiejun.chen

On 05/06/2013 10:58 PM, Alexander Graf wrote:

On 05/06/2013 04:53 AM, Tiejun Chen wrote:

Actually E500MC also support doorbell exception, and CONFIG_PPC_E500MC
can cover BOOK3E/BOOK3E_64 as well.

Signed-off-by: Tiejun Chentiejun.c...@windriver.com
---
  arch/powerpc/kvm/booke.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..dc1f590 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
  kvmppc_fill_pt_regs(regs);
  timer_interrupt(regs);
  break;
-#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64)
+#if defined(CONFIG_PPC_E500MC)


I suppose you mean CONFIG_KVM_E500MC here? Why didn't this work for you before?


This works for me.

Here I just mean currently CONFIG_PPC_E500MC is always selected no matter what 
CONFIG_PPC_FSL_BOOK3E or CONFIG_PPC_BOOK3E_64 is enabled. And especially, this 
is already in the arch/powerpc/kvm/booke.c file, so I think one #ifdef 
(CONFIG_PPC_E500MC) is enough and also makes sense.



The ifdef above should cover the same range of CPUs.


Or furthermore, the #ifdef CONFIG_PPC_DOORBELL is reasonable to cover this.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-06 Thread tiejun.chen

On 05/07/2013 07:50 AM, Scott Wood wrote:

On 05/05/2013 10:13:17 PM, tiejun.chen wrote:

On 05/06/2013 11:10 AM, Tiejun Chen wrote:

For the external interrupt, the decrementer exception and the doorbell
excpetion, we also need to soft-disable interrupts while doing as host
interrupt handlers since the DO_KVM hook is always performed to skip
EXCEPTION_COMMON then miss this original chance with the 'ints' (INTS_DISABLE).


http://patchwork.ozlabs.org/patch/241344/
http://patchwork.ozlabs.org/patch/241412/

:-)


I'm observing the same behaviour as well:

WARN_ON_ONCE(!irqs_disabled());




Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
  arch/powerpc/kvm/bookehv_interrupts.S |9 +
  1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..2fd62bf 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -33,6 +33,8 @@

  #ifdef CONFIG_64BIT
  #include asm/exception-64e.h
+#include asm/hw_irq.h
+#include asm/irqflags.h
  #else
  #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */
  #endif
@@ -469,6 +471,13 @@ _GLOBAL(kvmppc_resume_host)
  PPC_LLr3, HOST_RUN(r1)
  mrr5, r14 /* intno */
  mrr14, r4 /* Save vcpu pointer. */
+#ifdef CONFIG_64BIT
+/* Should we soft-disable interrupts? */
+andi.r6, r5, BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER
| BOOKE_INTERRUPT_DOORBELL
+beqskip_soft_dis
+SOFT_DISABLE_INTS(r7,r8)
+skip_soft_dis:
+#endif


Why wouldn't we always disable them?  kvmppc_handle_exit() will enable
interrupts when it's ready.


This only disable soft interrupt for kvmppc_restart_interrupt() that restarts 
interrupts if they were meant for the host:


a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | 
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


b. bl  kvmppc_handle_exit

c. kvmppc_handle_exit()
{
int r = RESUME_HOST;
int s;

/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);

/* restart interrupts if they were meant for the host */
kvmppc_restart_interrupt(vcpu, exit_nr);

local_irq_enable(); == Enable again.


And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack)   \
START_EXCEPTION(label); \
NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\
EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \
...

So I think this should be reasonable :)

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-06 Thread tiejun.chen

On 05/07/2013 10:06 AM, Scott Wood wrote:

On 05/06/2013 08:56:25 PM, tiejun.chen wrote:

On 05/07/2013 07:50 AM, Scott Wood wrote:

On 05/05/2013 10:13:17 PM, tiejun.chen wrote:

On 05/06/2013 11:10 AM, Tiejun Chen wrote:

For the external interrupt, the decrementer exception and the doorbell
excpetion, we also need to soft-disable interrupts while doing as host
interrupt handlers since the DO_KVM hook is always performed to skip
EXCEPTION_COMMON then miss this original chance with the 'ints'
(INTS_DISABLE).


http://patchwork.ozlabs.org/patch/241344/
http://patchwork.ozlabs.org/patch/241412/

:-)


I'm observing the same behaviour as well:

WARN_ON_ONCE(!irqs_disabled());


So, could you explain the benefits of your approach over what's being discussed
in those threads?


They're a long thread so I think I need to take time to see :)




Why wouldn't we always disable them?  kvmppc_handle_exit() will enable
interrupts when it's ready.


This only disable soft interrupt for kvmppc_restart_interrupt() that restarts
interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL


Those aren't the only exceptions that can end up going to the host.  We could
get a TLB miss that results in a heavyweight MMIO exit, etc.


This is like host handler, so I'm just disabling soft interrupt during 
kvmppc_restart_interrupt() for Doorbell interrupt/Decrementer Interrupt/External 
Input Interrupt.


I don't see anything should be disabled for any TLB exception in host handler.



And I'd rather see any fix for this problem stay out of the asm code.


We already have an appropriate SOFT_DISABLE_INTS so I think we can take this 
easily :)





b. bl  kvmppc_handle_exit

c. kvmppc_handle_exit()
{
int r = RESUME_HOST;
int s;

/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);

/* restart interrupts if they were meant for the host */
kvmppc_restart_interrupt(vcpu, exit_nr);

local_irq_enable();== Enable again.


And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack)   \
START_EXCEPTION(label); \
NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\
EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \
...


Could you elaborate on what you mean?


In host handler, we always use MASKABLE_EXCEPTION() to define-to-handle some 
exceptions: Doorbell interrupt/Decrementer Interrupt/External Input Interrupt:


#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack)   \
START_EXCEPTION(label); \
NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\
EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \

This would call INTS_DISABLE, which is equal to SOFT_DISABLE_INTS(), to disable 
soft interrupt before call all associated handlers: 
do_IRQ()/timer_interrupt()/doorbell_exception().


But DO_KVM hook always skips INTS_DISABLE.

So I think we also need to do INTS_DISABLE for kvmppc_restart_interrupt() since 
actually that restarts interrupts for the host with a similar way as they are 
called by host.


Tiejun



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-05 Thread tiejun.chen

On 05/06/2013 11:10 AM, Tiejun Chen wrote:

For the external interrupt, the decrementer exception and the doorbell
excpetion, we also need to soft-disable interrupts while doing as host
interrupt handlers since the DO_KVM hook is always performed to skip
EXCEPTION_COMMON then miss this original chance with the 'ints' (INTS_DISABLE).


Sorry, miss to send Ben.

Tiejun



Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
  arch/powerpc/kvm/bookehv_interrupts.S |9 +
  1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..2fd62bf 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -33,6 +33,8 @@

  #ifdef CONFIG_64BIT
  #include asm/exception-64e.h
+#include asm/hw_irq.h
+#include asm/irqflags.h
  #else
  #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */
  #endif
@@ -469,6 +471,13 @@ _GLOBAL(kvmppc_resume_host)
PPC_LL  r3, HOST_RUN(r1)
mr  r5, r14 /* intno */
mr  r14, r4 /* Save vcpu pointer. */
+#ifdef CONFIG_64BIT
+   /* Should we soft-disable interrupts? */
+   andi.   r6, r5, BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER 
| BOOKE_INTERRUPT_DOORBELL
+   beq skip_soft_dis
+   SOFT_DISABLE_INTS(r7,r8)
+skip_soft_dis:
+#endif
bl  kvmppc_handle_exit

/* Restore vcpu pointer and the nonvolatiles we used. */



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description

2013-04-26 Thread tiejun.chen

On 04/26/2013 11:11 AM, tiejun.chen wrote:

On 04/25/2013 07:32 PM, Caraman Mihai Claudiu-B02008 wrote:

Is the flowing is fine with that generic machine, ppce500, to boot
P5040DS with
64bit,

./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel
uImage
-initrd ramdisk.gz  -L . -append root=/dev/ram rw console=ttyS0,115200
-cpu
e5500 -dtb p5040ds.dtb

Thanks,

Tiejun


There is no need for -dtb.


With your comment, I use kvm-ppc-queue which top commit is be28a27c, kvm/ppc:
don't call complete_mmio_load when it's a store, in plus that patch you
pointedto build one uImage based on corenet64_smp_defconfig, but we need to
enable CONFIG_PPC_QEMU_E500 manually, and select CONFIG_TICK_CPU_ACCOUNTING
since the default CONFIG_VIRT_CPU_ACCOUNTING_NATIVE would introduce some trace
when boot VM.

And perform as follows:

./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel uImage
-initrd ramdisk.gz  -L . -append root=/dev/ram rw console=ttyS0,115200 -cpu 
e5500

But I can't see anything in the serial port.


Please ignore this since e5500 is okay now.

Thanks for your reply.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7 v3] KVM: PPC: exit to user space on ehpriv instruction

2013-04-26 Thread tiejun.chen

On 04/26/2013 06:45 PM, Alexander Graf wrote:


On 08.04.2013, at 12:32, Bharat Bhushan wrote:


From: Bharat Bhushan bharat.bhus...@freescale.com

ehpriv instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with run-debug have relevant information.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
arch/powerpc/kvm/e500_emulate.c |   10 ++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index e78f353..cefdd38 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -26,6 +26,7 @@
#define XOP_TLBRE   946
#define XOP_TLBWE   978
#define XOP_TLBILX  18
+#define XOP_EHPRIV  270

#ifdef CONFIG_KVM_E500MC
static int dbell2prio(ulong param)
@@ -130,6 +131,15 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
emulated = kvmppc_e500_emul_tlbivax(vcpu, ea);
break;

+   case XOP_EHPRIV:


This is supposed to check for oc, no?


The other day I already sent one patch only to check OC, KVM/PPC: emulate 
ehpriv.

But Bharat said he's waiting for other debug patches to be reviewed.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description

2013-04-25 Thread tiejun.chen

On 04/25/2013 05:09 PM, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Friday, April 19, 2013 1:03 PM
To: Caraman Mihai Claudiu-B02008
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig
description

On 04/11/2013 06:03 PM, Mihai Caraman wrote:

Add e6500 core to Kconfig description.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v3:
   - No change

   arch/powerpc/kvm/Kconfig |6 +++---
   1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 63c67ec..4489520 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -136,15 +136,15 @@ config KVM_E500V2
  If unsure, say N.

   config KVM_E500MC
-   bool KVM support for PowerPC E500MC/E5500 processors
+   bool KVM support for PowerPC E500MC/E5500/E6500 processors
depends on PPC_E500MC
select KVM
select KVM_MMIO
select KVM_BOOKE_HV
select MMU_NOTIFIER
---help---
- Support running unmodified E500MC/E5500 (32-bit) guest kernels in


I ever tried p5040ds but failed with 64-bit, but looks are you saying
this patch
set can make e5500/e6500 work well with 64-bit? If so, will we need to
upgrade
qemu or something else like dtb?


KVM should work on p5040ds with and without this patchset. The latest
qemu requires this patch: powerpc: Add paravirt idle loop for 64-bit Book-E,
you will not pass guest udev without it.


Which should qemu tree be used here?

My tree is cloned from:

git://repo.or.cz/qemu/agraf.git ppc-next

But I can't find this commit.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description

2013-04-25 Thread tiejun.chen

On 04/25/2013 05:32 PM, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Thursday, April 25, 2013 12:17 PM
To: Caraman Mihai Claudiu-B02008
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig
description

On 04/25/2013 05:09 PM, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: tiejun.chen [mailto:tiejun.c...@windriver.com]
Sent: Friday, April 19, 2013 1:03 PM
To: Caraman Mihai Claudiu-B02008
Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig
description

On 04/11/2013 06:03 PM, Mihai Caraman wrote:

Add e6500 core to Kconfig description.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v3:
- No change

arch/powerpc/kvm/Kconfig |6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 63c67ec..4489520 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -136,15 +136,15 @@ config KVM_E500V2
  If unsure, say N.

config KVM_E500MC
-   bool KVM support for PowerPC E500MC/E5500 processors
+   bool KVM support for PowerPC E500MC/E5500/E6500 processors
depends on PPC_E500MC
select KVM
select KVM_MMIO
select KVM_BOOKE_HV
select MMU_NOTIFIER
---help---
- Support running unmodified E500MC/E5500 (32-bit) guest kernels in


I ever tried p5040ds but failed with 64-bit, but looks are you saying
this patch
set can make e5500/e6500 work well with 64-bit? If so, will we need to
upgrade
qemu or something else like dtb?


KVM should work on p5040ds with and without this patchset. The latest
qemu requires this patch: powerpc: Add paravirt idle loop for 64-bit

Book-E,

you will not pass guest udev without it.


This is a kernel patch required by latest qemu.


Looks this commit is applied only into galak/powerpc.git, next, but still not 
merged into agraf/linux-2.6.git, so I'm confused which tree can support 64bit 
Book3E KVM as you point.


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] kvm:book3e: Fix a build error

2013-04-25 Thread tiejun.chen

On 04/25/2013 08:11 PM, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
ow...@vger.kernel.org] On Behalf Of Tiejun Chen
Sent: Thursday, April 25, 2013 2:46 PM
To: ga...@kernel.crashing.org
Cc: linuxppc-...@lists.ozlabs.org; kvm-ppc@vger.kernel.org;
k...@vger.kernel.org
Subject: [PATCH 1/1] kvm:book3e: Fix a build error

Commit cd66cc2e, powerpc/85xx: Add AltiVec support for e6500, adds
support for AltiVec on a Book-E class processor, but while compiling
in the CONFIG_PPC_BOOK3E_64 and CONFIG_VIRTUALIZATION case, this
introduce the following error:

arch/powerpc/kernel/exceptions-64e.S:402: undefined reference to
`kvmppc_handler_42_0x01B'
arch/powerpc/kernel/built-in.o: In function `exc_altivec_assist_book3e':
arch/powerpc/kernel/exceptions-64e.S:424: undefined reference to
`kvmppc_handler_43_0x01B'
make: *** [vmlinux] Error 1

Looks we should add these altivec kvm handlers.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
  arch/powerpc/kvm/bookehv_interrupts.S |5 +
  1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..fa9c78a 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -319,6 +319,11 @@ kvm_handler BOOKE_INTERRUPT_DEBUG, EX_PARAMS(DBG), \
SPRN_DSRR0, SPRN_DSRR1, 0
  kvm_handler BOOKE_INTERRUPT_DEBUG, EX_PARAMS(CRIT), \
SPRN_CSRR0, SPRN_CSRR1, 0
+/* altivec */
+kvm_handler BOOKE_INTERRUPT_ALTIVEC_UNAVAIL, EX_PARAMS(GEN), \
+   SPRN_SRR0, SPRN_SRR1, 0
+kvm_handler BOOKE_INTERRUPT_ALTIVEC_ASSIST, EX_PARAMS(GEN), \
+   SPRN_SRR0, SPRN_SRR1, 0
  #else
  /*
   * For input register values, see
arch/powerpc/include/asm/kvm_booke_hv_asm.h
--


It seems that you are not using kvm-ppc-queue branch.


This is just used to fix a build error in powerpc.git when introduce commit 
cd66cc2e, powerpc/85xx: Add AltiVec support for e6500, in *powerpc.git* as I 
mentioned in this patch head.




I already have a patch ready for this (and AltiVec support is work


This change don't block your AltiVec support for kvm unless you think this 
change is wrong. And especially, we always can reproduce this error with/without 
enabling AltiVec, so I also don't think this should be suspended until support 
e6500 in kvm since kvm based on e5500 should work.


Tiejun


in progress) but we need first to pull e6500 kernel patches from
Linux tree into agraf.git.

-Mike










--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description

2013-04-25 Thread tiejun.chen

On 04/25/2013 07:32 PM, Caraman Mihai Claudiu-B02008 wrote:

Is the flowing is fine with that generic machine, ppce500, to boot
P5040DS with
64bit,

./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel
uImage
-initrd ramdisk.gz  -L . -append root=/dev/ram rw console=ttyS0,115200
-cpu
e5500 -dtb p5040ds.dtb

Thanks,

Tiejun


There is no need for -dtb.


With your comment, I use kvm-ppc-queue which top commit is be28a27c, kvm/ppc: 
don't call complete_mmio_load when it's a store, in plus that patch you 
pointed	to build one uImage based on corenet64_smp_defconfig, but we need to 
enable CONFIG_PPC_QEMU_E500 manually, and select CONFIG_TICK_CPU_ACCOUNTING 
since the default CONFIG_VIRT_CPU_ACCOUNTING_NATIVE would introduce some trace 
when boot VM.


And perform as follows:

./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel uImage 
-initrd ramdisk.gz  -L . -append root=/dev/ram rw console=ttyS0,115200 -cpu e5500


But I can't see anything in the serial port.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description

2013-04-19 Thread tiejun.chen

On 04/11/2013 06:03 PM, Mihai Caraman wrote:

Add e6500 core to Kconfig description.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v3:
  - No change

  arch/powerpc/kvm/Kconfig |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 63c67ec..4489520 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -136,15 +136,15 @@ config KVM_E500V2
  If unsure, say N.

  config KVM_E500MC
-   bool KVM support for PowerPC E500MC/E5500 processors
+   bool KVM support for PowerPC E500MC/E5500/E6500 processors
depends on PPC_E500MC
select KVM
select KVM_MMIO
select KVM_BOOKE_HV
select MMU_NOTIFIER
---help---
- Support running unmodified E500MC/E5500 (32-bit) guest kernels in


I ever tried p5040ds but failed with 64-bit, but looks are you saying this patch 
set can make e5500/e6500 work well with 64-bit? If so, will we need to upgrade 
qemu or something else like dtb?


Tiejun


- virtual machines on E500MC/E5500 host processors.
+ Support running unmodified E500MC/E5500/E6500 guest kernels in
+ virtual machines on E500MC/E5500/E6500 host processors.

  This module provides access to the hardware capabilities through
  a character device node named /dev/kvm.



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7 v3] KVM: PPC: exit to user space on ehpriv instruction

2013-04-09 Thread tiejun.chen

On 04/08/2013 06:32 PM, Bharat Bhushan wrote:

From: Bharat Bhushan bharat.bhus...@freescale.com

ehpriv instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with run-debug have relevant information.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
  arch/powerpc/kvm/e500_emulate.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index e78f353..cefdd38 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -26,6 +26,7 @@
  #define XOP_TLBRE   946
  #define XOP_TLBWE   978
  #define XOP_TLBILX  18
+#define XOP_EHPRIV  270

  #ifdef CONFIG_KVM_E500MC
  static int dbell2prio(ulong param)
@@ -130,6 +131,15 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
emulated = kvmppc_e500_emul_tlbivax(vcpu, ea);
break;

+   case XOP_EHPRIV:
+   run-exit_reason = KVM_EXIT_DEBUG;


IIRC, the ehpriv instruction should generate a Hypervisor Privilege Exception to 
trap into the Hypervisor proactive. And we can use this ability to design 
something conveniently. And so, that is not only for the debug mechanism like 
you did.


So here if 'run-exit_reason' is fixed to KVM_EXIT_DEBUG, how to distinguish 
other scenarios? So as I understand, we should use 'ehpriv oc' exactly then 
resolve 'oc' further to go different cases, right?


Tiejun


+   run-debug.arch.address = vcpu-arch.pc;
+   run-debug.arch.status = 0;
+   kvmppc_account_exit(vcpu, DEBUG_EXITS);
+   emulated = EMULATE_EXIT_USER;
+   *advance = 0;
+   break;
+
default:
emulated = EMULATE_FAIL;
}



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/6] kvm: add device control API

2013-04-02 Thread tiejun.chen

On 04/02/2013 06:47 AM, Scott Wood wrote:

Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it.  If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.

This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device.  Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc.  It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.

Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.

Signed-off-by: Scott Wood scottw...@freescale.com
---
  Documentation/virtual/kvm/api.txt|   70 ++
  Documentation/virtual/kvm/devices/README |1 +
  arch/powerpc/include/asm/kvm_host.h  |6 +++
  arch/powerpc/include/asm/kvm_ppc.h   |2 +
  arch/powerpc/kvm/powerpc.c   |7 +++
  include/uapi/linux/kvm.h |   27 
  virt/kvm/kvm_main.c  |   31 +
  7 files changed, 144 insertions(+)
  create mode 100644 Documentation/virtual/kvm/devices/README

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 976eb65..77328aa 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2173,6 +2173,76 @@ header; first `n_valid' valid entries with contents from 
the data
  written, then `n_invalid' invalid entries, invalidating any previously
  valid entries found.

+4.79 KVM_CREATE_DEVICE
+
+Capability: KVM_CAP_DEVICE_CTRL
+Type: vm ioctl
+Parameters: struct kvm_create_device (in/out)
+Returns: 0 on success, -1 on error
+Errors:
+  ENODEV: The device type is unknown or unsupported
+  EEXIST: Device already created, and this type of device may not
+  be instantiated multiple times
+  ENOSPC: Too many devices have been created
+
+  Other error conditions may be defined by individual device types.
+
+Creates an emulated device in the kernel.  The file descriptor returned
+in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
+
+If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
+device type is supported (not necessarily whether it can be created
+in the current vm).
+
+Individual devices should not define flags.  Attributes should be used
+for specifying any behavior that is not implied by the device type
+number.
+
+struct kvm_create_device {
+   __u32   type;   /* in: KVM_DEV_TYPE_xxx */
+   __u32   fd; /* out: device handle */
+   __u32   flags;  /* in: KVM_CREATE_DEVICE_xxx */
+};
+
+4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
+
+Capability: KVM_CAP_DEVICE_CTRL
+Type: device ioctl
+Parameters: struct kvm_device_attr
+Returns: 0 on success, -1 on error
+Errors:
+  ENXIO:  The group or attribute is unknown/unsupported for this device
+  EPERM:  The attribute cannot (currently) be accessed this way
+  (e.g. read-only attribute, or attribute that only makes
+  sense when the device is in a different state)
+
+  Other error conditions may be defined by individual device types.
+
+Gets/sets a specified piece of device configuration and/or state.  The
+semantics are device-specific.  See individual device documentation in
+the devices directory.  As with ONE_REG, the size of the data
+transferred is defined by the particular attribute.
+
+struct kvm_device_attr {
+   __u32   flags;  /* no flags currently defined */
+   __u32   group;  /* device-defined */
+   __u64   attr;   /* group-defined */
+   __u64   addr;   /* userspace address of attr data */
+};
+
+4.81 KVM_HAS_DEVICE_ATTR
+
+Capability: KVM_CAP_DEVICE_CTRL
+Type: device ioctl
+Parameters: struct kvm_device_attr
+Returns: 0 on success, -1 on error
+Errors:
+  ENXIO:  The group or attribute is unknown/unsupported for this device
+
+Tests whether a device supports a particular attribute.  A successful
+return indicates the attribute is implemented.  It does not necessarily
+indicate that the attribute can be read or written in the device's
+current state.  addr is ignored.

  4.77 KVM_ARM_VCPU_INIT

diff --git a/Documentation/virtual/kvm/devices/README 
b/Documentation/virtual/kvm/devices/README
new file mode 100644
index 000..34a6983
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/README
@@ -0,0 +1 @@
+This directory contains specific device bindings for KVM_CAP_DEVICE_CTRL.
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e34f8fe..e0caae2 100644
--- 

Re: [RFC PATCH v2 1/6] kvm: add device control API

2013-04-02 Thread tiejun.chen

On 04/03/2013 01:30 AM, Scott Wood wrote:

On 04/02/2013 01:59:57 AM, tiejun.chen wrote:

On 04/02/2013 06:47 AM, Scott Wood wrote:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff71541..ed033c0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2158,6 +2158,17 @@ out:
  }
  #endif

+static int kvm_ioctl_create_device(struct kvm *kvm,
+   struct kvm_create_device *cd)
+{
+bool test = cd-flags  KVM_CREATE_DEVICE_TEST;
+
+switch (cd-type) {
+default:
+return -ENODEV;
+}


Even after apply patch 5, looks here still misses something like:

if (test)
WARN_ON_ONCE(!cd-type);


Why?  How does userspace passing in a bad type value mean the kernel needs to
report internal badness, why is a value of zero worse than any other bad value,
and why only when the test flag is set?


I just mean we need do something here since looks the 'test' variable is defined 
but unused, right? But please correct this as you expect :)


And if the userspace can't guarantee cd-type is never zero, we should return 
-ENODEV as well after that switch().


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/6] kvm: add device control API

2013-04-02 Thread tiejun.chen

On 04/03/2013 09:34 AM, Scott Wood wrote:

On 04/02/2013 08:28:01 PM, tiejun.chen wrote:

On 04/03/2013 01:30 AM, Scott Wood wrote:

On 04/02/2013 01:59:57 AM, tiejun.chen wrote:

On 04/02/2013 06:47 AM, Scott Wood wrote:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff71541..ed033c0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2158,6 +2158,17 @@ out:
  }
  #endif

+static int kvm_ioctl_create_device(struct kvm *kvm,
+   struct kvm_create_device *cd)
+{
+bool test = cd-flags  KVM_CREATE_DEVICE_TEST;
+
+switch (cd-type) {
+default:
+return -ENODEV;
+}


Even after apply patch 5, looks here still misses something like:

if (test)
WARN_ON_ONCE(!cd-type);


Why?  How does userspace passing in a bad type value mean the kernel needs to
report internal badness, why is a value of zero worse than any other bad value,
and why only when the test flag is set?


I just mean we need do something here since looks the 'test' variable is
defined but unused, right? But please correct this as you expect :)


Yes, it's unused in this patch, but is used after patch 5 is applied.  I didn't
think it was worth adding a temporary unused annotation, since this part of the
kernel doesn't use -Werror.


Yes, its accepted in !-Werror case if we shouldn't warn something as you said.

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unexpected data TLB miss happens when guest OS executing a bl instruction

2012-07-04 Thread tiejun.chen
On 07/03/2012 09:44 PM, Fei K Chen wrote:
 On Jul 2, 2012, at 10:34 AM, Fei K Chen wrote:

 We are debuging kvm on IBM poweren chip by RSICWatch tool.

 So for those of you who don't know,  the RISCWatch tool (RW) is a JTAG 
 based HW probe debugger.

 An unexpected data TLB miss happened and we can not explain why. Any one 
 have met this before?

 1. Guest OS executes a bl instruction with PC=0xC05A49CC. 
 According to the guest linux kernel objdump file, the next instruction will 
 be mflr r0 with PC=0xC0599CC0.

 Normally, if you were debugging the host, you could strip off the 0xC and 
 read the location from memory to verify this.
 However, since you are debugging the guest, you probably can't figure out 
 the machine physical address.


 2. By single-step execution in RISCWatch, guest OS does jump to an 
 instruction with PC=0xC0599CC0. At this time, RISCWatch tool can 
 not display what the instruction is. We guess this is because there is no 
 instruction TLB entry in hardware TLB for PC=0xC0599CC0. Thus an 
 instruction TLB miss is expected if we press the Asmstep to execute the 
 next instruction.

 3. Unfortunately, poweren jumps an instruction with PC=0xC0051FF4 
 which is the beginning of data TLB miss entry in kvm. We read the values in 
 spr SRR0 and DEAR. Both of them are 0xC0599CC0. We even can not 
 imagine why this happens.

 So when RW decides to look at memory it uses instruction stuffing/ramming 
 where it is able to insert instructions into the thread's instruction 
 port,  this way it can perform loads and stores using translation in the 
 same way that software does.
 Since you are using the ASM window, it is trying to read the instruction 
 (before it is executed) and this causes your data fault.
 So this is normal and completely expect.

 Instead of using the ASM window use the command line window to step and 
 read iar then you will get the correct instruction fault you are looking 
 for.
 
 Yes, as what you said, by using the command line window, the data TLB miss 
 does not happen now.
 
 Note: It may also be the case that you can uncheck  the track IAR box in 
 the asm windows if you insist on doing that.


 4. As external interrupt will happen during single-step debugging, we set a 
 hardware breakpoint at PC=0xC0599CC0, and let poweren directly run 
 to that point.

 Yes, hardware probes become difficult with any interrupts active and, 
 IMNSHO, should really only be used to debug bootstrap or exception level 
 code.
 It would be way easier to instrument the host fault handlers to help you 
 debug this case.


 5. When poweren stops at PC=0xC0599CC0, from the output of 
 RISCWatch, a trap instruction is placed at PC=0xC0599CC0. It is 
 different with what should be according to the kernel objdump file. The 
 only explanation we can imagine is that our kvm code set a wrong TLB entry 
 for PC=0xC0599CC0 (it may be brought by that unexpected data TLB 
 miss).

 As explained above, I'm pretty sure you did not hit the data fault in the 
 same way as before.
 Does the rest of the instruction stream match? If not then you likely have a 
 translation error.
 
 The reset of the instructions before and after 0xC0599CC0 match the 
 vmlinux objdump file. So it seems that someone changed the memory content at 
 0xC0599CC0.
 
 However, there are only a handful of static trap instructions in vmlinux, 
 so you should be able to track them down.
 My bet is that some software (or RW) has inserted the trap instruction to 
 facilitate some form of break point?
 
 The instruction in 0xC0599CC0 should not be covered by anyone because 
 the instruction mflr r0 is saving the lr register. Without the value in lr, 
 function call can not return to the caller. 
 

I think you can set a data write breakpoint at another address to check if that
is replaced with trap instruction as well. If not, at least this means this
shouldn't be inserted by RW :)

When that stops, you also can dump 0xC0599CC0 to take a look at what
happened.

 I am tracing the event that someone writes to address 0xC0599CC0 with 
 the help of RW. Strange, poweren dose not stop before it stops at that 
 unexpected trap instruction. It looks like that the trap exists before I 
 set up the data address write breakpoint.

Did you enable kprobe to do probe something? Since Kprobe just use trap
instruction to replace the krpobed address to make CPU hit.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/38] KVM: PPC: booke: category E.HV (GS-mode) support

2012-03-05 Thread tiejun.chen
 +/*
 + * Host interrupt handlers may have clobbered these guest-readable
 + * SPRGs, so we need to reload them here with the guest's values.
 + */
 +lwz r3, VCPU_VRSAVE(r4)
 +lwz r5, VCPU_SHARED_SPRG4(r11)
 +mtspr   SPRN_VRSAVE, r3
 +lwz r6, VCPU_SHARED_SPRG5(r11)
 +mtspr   SPRN_SPRG4W, r5
 +lwz r7, VCPU_SHARED_SPRG6(r11)
 +mtspr   SPRN_SPRG5W, r6
 +lwz r8, VCPU_SHARED_SPRG7(r11)
 +mtspr   SPRN_SPRG6W, r7
 +mtspr   SPRN_SPRG7W, r8
 +

That should be here.

 +/* Load some guest volatiles. */
 +PPC_LL  r3, VCPU_LR(r4)
 +PPC_LL  r5, VCPU_XER(r4)
 +PPC_LL  r6, VCPU_CTR(r4)
 +PPC_LL  r7, VCPU_CR(r4)
 +PPC_LL  r8, VCPU_PC(r4)
 +#ifndef CONFIG_64BIT
 +lwz r9, (VCPU_SHARED_MSR + 4)(r11)
 +#else
 +ld  r9, (VCPU_SHARED_MSR)(r11)
 +#endif
 +PPC_LL  r0, VCPU_GPR(r0)(r4)
 +PPC_LL  r1, VCPU_GPR(r1)(r4)
 +PPC_LL  r2, VCPU_GPR(r2)(r4)
 +PPC_LL  r10, VCPU_GPR(r10)(r4)
 +PPC_LL  r11, VCPU_GPR(r11)(r4)
 +PPC_LL  r12, VCPU_GPR(r12)(r4)
 +PPC_LL  r13, VCPU_GPR(r13)(r4)
 +mtlrr3
 +mtxer   r5
 +mtctr   r6
 +mtcrr7
 +mtsrr0  r8
 +mtsrr1  r9
 +
 +#ifdef CONFIG_KVM_EXIT_TIMING
 +/* save enter time */
 +1:
 +mfspr   r6, SPRN_TBRU
 +mfspr   r7, SPRN_TBRL
 +mfspr   r8, SPRN_TBRU
 +cmpwr8, r6
 
 Is not we should save guest CR after this otherwise this can corrupt it?

I think this should be a typo since in our previous kvm implementation, we
always did collect kvm exit timing at the above location :)

Tiejun

 
 Thanks
 -Bharat
 
 +PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4)
 +bne 1b  
 +PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4)
 +#endif
 +
 +/* Finish loading guest volatiles and jump to guest. */
 +PPC_LL  r5, VCPU_GPR(r5)(r4)
 +PPC_LL  r6, VCPU_GPR(r6)(r4)
 +PPC_LL  r7, VCPU_GPR(r7)(r4)
 +PPC_LL  r8, VCPU_GPR(r8)(r4)
 +PPC_LL  r9, VCPU_GPR(r9)(r4)
 +
 +PPC_LL  r3, VCPU_GPR(r3)(r4)
 +PPC_LL  r4, VCPU_GPR(r4)(r4)
 +rfi
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/4] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-21 Thread tiejun.chen
Liu Yu wrote:
 If the guest hypervisor node contains has-idle property.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v5: no change
 
  arch/powerpc/kernel/epapr_hcalls.S   |   29 +
  arch/powerpc/kernel/epapr_paravirt.c |   11 ++-
  2 files changed, 39 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/kernel/epapr_hcalls.S 
 b/arch/powerpc/kernel/epapr_hcalls.S
 index 697b390..72fa234 100644
 --- a/arch/powerpc/kernel/epapr_hcalls.S
 +++ b/arch/powerpc/kernel/epapr_hcalls.S
 @@ -15,6 +15,35 @@
  #include asm/ppc_asm.h
  #include asm/asm-offsets.h
  
 +#define HC_VENDOR_EPAPR  (1  16)
 +#define HC_EV_IDLE   16

Why not use 'EV_IDLE' directly?

 +
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 + rlwinm  r3,r1,0,0,31-THREAD_SHIFT   /* current thread_info */
 + lwz r4,TI_LOCAL_FLAGS(r3)   /* set napping bit */
 + ori r4,r4,_TLF_NAPPING  /* so when we take an exception */
 + stw r4,TI_LOCAL_FLAGS(r3)   /* it will return to our caller */
 +
 + wrteei  1
 +
 +idle_loop:
 + LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)

And could this line be simplified as something like this:

LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))

If so, even we can remove the previous HC_VENDOR_EPAPR definition as well.

Tiejun

 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 + li  r3, -1
 + nop
 + nop
 + nop
 +
 + /*
 +  * Guard against spurious wakeups from a hypervisor --
 +  * only interrupt will cause us to return to LR due to
 +  * _TLF_NAPPING.
 +  */
 + b   idle_loop
 +
  /* Hypercall entry point. Will be patched with device tree instructions. */
  .global epapr_hypercall_start
  epapr_hypercall_start:
 diff --git a/arch/powerpc/kernel/epapr_paravirt.c 
 b/arch/powerpc/kernel/epapr_paravirt.c
 index e601da7..43d875e 100644
 --- a/arch/powerpc/kernel/epapr_paravirt.c
 +++ b/arch/powerpc/kernel/epapr_paravirt.c
 @@ -21,6 +21,10 @@
  #include asm/epapr_hcalls.h
  #include asm/cacheflush.h
  #include asm/code-patching.h
 +#include asm/machdep.h
 +
 +extern void epapr_ev_idle(void);
 +extern u32 epapr_ev_idle_start[];
  
  bool epapr_para_enabled = false;
  
 @@ -39,8 +43,13 @@ static int __init epapr_para_init(void)
  
   insts = of_get_property(hyper_node, hcall-instructions, len);
   if (insts  !(len % 4)  len = (4 * 4)) {
 - for (i = 0; i  (len / 4); i++)
 + for (i = 0; i  (len / 4); i++) {
   patch_instruction(epapr_hypercall_start + i, insts[i]);
 + patch_instruction(epapr_ev_idle_start + i, insts[i]);
 + }
 +
 + if (of_get_property(hyper_node, has-idle, NULL))
 + ppc_md.power_save = epapr_ev_idle;
  
   epapr_para_enabled = true;
   } else {

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/4] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-21 Thread tiejun.chen
Liu Yu-B13201 wrote:
 
 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Tuesday, February 21, 2012 6:54 PM
 To: Liu Yu-B13201
 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
 linuxppc-...@ozlabs.org; Wood Scott-B07421
 Subject: Re: [PATCH v5 3/4] KVM: PPC: epapr: install ev_idle hcall for
 e500 guest

 Liu Yu wrote:
 If the guest hypervisor node contains has-idle property.

 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v5: no change

  arch/powerpc/kernel/epapr_hcalls.S   |   29
 +
  arch/powerpc/kernel/epapr_paravirt.c |   11 ++-
  2 files changed, 39 insertions(+), 1 deletions(-)

 diff --git a/arch/powerpc/kernel/epapr_hcalls.S
 b/arch/powerpc/kernel/epapr_hcalls.S
 index 697b390..72fa234 100644
 --- a/arch/powerpc/kernel/epapr_hcalls.S
 +++ b/arch/powerpc/kernel/epapr_hcalls.S
 @@ -15,6 +15,35 @@
  #include asm/ppc_asm.h
  #include asm/asm-offsets.h

 +#define HC_VENDOR_EPAPR(1  16)
 +#define HC_EV_IDLE 16
 Why not use 'EV_IDLE' directly?

 +
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +   rlwinm  r3,r1,0,0,31-THREAD_SHIFT   /* current thread_info */
 +   lwz r4,TI_LOCAL_FLAGS(r3)   /* set napping bit */
 +   ori r4,r4,_TLF_NAPPING  /* so when we take an exception */
 +   stw r4,TI_LOCAL_FLAGS(r3)   /* it will return to our caller */
 +
 +   wrteei  1
 +
 +idle_loop:
 +   LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 And could this line be simplified as something like this:

 LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))

 If so, even we can remove the previous HC_VENDOR_EPAPR definition as well.

 
 Because the epapr_hcalls.h contains C functions,
 so it cannot be included by assembly code.

These common definitions are already covered in epapr_hcalls.h, but looks you
redefine the same items many times, in kvm_para.h/epapr_hcalls.S. And I think
maybe we'll also reuse these generics elsewhere lately.

So can we limit that with __ASSEMBLY__ in epapr_hcalls.h? Or other way. If so it
makes our life easy in the future.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html