Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-26 Thread Scott Wood
On Mon, 2015-10-26 at 16:15 +1100, Paul Mackerras wrote:
> On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
> > Yes, we'll likely need this soon! 32 slots are not enough...
> 
> Would anyone object if I raised the limit for PPC to 512 slots?
> Would that cause problems on embedded PPC, for instance?

I can't think of any reason why it would cause problems.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits

2015-10-01 Thread Scott Wood
On Thu, 2015-10-01 at 15:58 +0300, Laurentiu Tudor wrote:
> Fix couple of cases where we shift left a 32-bit
> value thus might get truncated results on 64-bit
> targets.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> Suggested-by: Scott Wood <scotttw...@freescale.com>
> ---
>  arch/powerpc/kvm/e500_mmu_host.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Scott Wood <scottw...@freescale.com>

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception

2015-09-30 Thread Scott Wood
On Wed, 2015-09-30 at 14:27 +0300, Laurentiu Tudor wrote:
> On 09/30/2015 01:32 PM, Laurentiu Tudor wrote:
> > On 09/25/2015 03:10 AM, Scott Wood wrote:
> > > On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote:
> 
> [snip]
> 
> > > > b/arch/powerpc/kvm/e500_mmu_host.c
> > > > index 12d5c67..99ad88a 100644
> > > > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > > > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > > > @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct 
> > > > kvm_book3e_206_tlb_entry *stlbe,
> > > > stlbe->mas2, stlbe->mas7_3);
> > > >  }
> > > >  
> > > > +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV)
> > > > +static int lrat_next(void)
> > > > +{
> > > 
> > > Will anything break by removing the CONFIG_64BIT condition, even if we 
> > > don't 
> > > have a 32-bit target that uses this?
> > 
> > Not completly certain but i remember getting compile or link errors
> > on 32-bit e500mc or e500v2. I can recheck if you want.
> > 
> > > > +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn)
> > > > +{
> > > > + struct kvm_memory_slot *slot;
> > > > + unsigned long pfn;
> > > > + unsigned long hva;
> > > > + struct vm_area_struct *vma;
> > > > + unsigned long psize;
> > > > + int tsize;
> > > > + unsigned long tsize_pages;
> > > > +
> > > > + slot = gfn_to_memslot(vcpu->kvm, gfn);
> > > > + if (!slot) {
> > > > + pr_err_ratelimited("%s: couldn't find memslot for gfn 
> > > > %lx!\n",
> > > > +__func__, (long)gfn);
> > > > + return;
> > > > + }
> > > > +
> > > > + hva = slot->userspace_addr;
> > > 
> > > What if the faulting address is somewhere in the middle of the slot?  
> > > Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()?  
> > > In 
> > > fact there's probably a lot of logic that should be shared between 
> > > these two 
> > > functions.
> > 
> > So if my understanding is correct most of the gfn -> pfn translation
> > stuff done in kvmppc_e500_shadow_map() should also be present in here.
> > If that's the case maybe i should first extract this code (which includes
> > VM_PFNMAP handling) in a separate function and call it from both 
> > kvmppc_lrat_map()
> > and kvmppc_e500_shadow_map(). 
> > 
> 
> Off-topic, but just noticed that kvmppc_e500_shadow_map() is marked as 
> inline.
> Was that on purpose? Is inlining such a large function worth anything?

I don't remember the intent.  It was probably a lot smaller back then.  That 
said, it's only used two places, with probably pretty good temporal 
separation between performance-intensive uses of one versus the other (so not 
a huge icache concern), and a pretty good portion of the function will be 
optimized out in the caller with tlbsel == 0.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-09-28 Thread Scott Wood
On Fri, 2015-09-25 at 18:02 +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman <mihai.cara...@freescale.com>
> [laurentiu.tu...@freescale.com: rebased on latest kernel, use
>  define instead of hardcoded value, moved code in own function]
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> ---
> v2:
>  - moved code in its own function
> Needs this patch: https://patchwork.ozlabs.org/patch/521752/
> 
>  arch/powerpc/include/asm/disassemble.h |  5 +
>  arch/powerpc/kvm/e500_emulate.c| 19 +++
>  2 files changed, 24 insertions(+)

Acked-by: Scott Wood <scotttw...@freescale.com>

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e6500: support powers of 2K TLB1 sizes

2015-09-25 Thread Scott Wood
On Fri, 2015-09-25 at 17:30 +0300, Laurentiu Tudor wrote:
> On 09/24/2015 11:23 PM, Scott Wood wrote:
> > On Thu, 2015-09-24 at 15:57 +0300, Laurentiu Tudor wrote:
> > > Book-E MMUv2 present in e6500 cores supports
> > > powers of 2K page sizes while older MMUv1 cores
> > > support only powers of 4K page sizes, or in other
> > > words the LSB of TSIZE on MMUv1 is always 0.
> > > Thus, on MMUv2 we must not strip the LSB.
> > 
> > We can get better TLB utilization by not stripping it, but why "must not" 
> > which makes it sound like a bugfix rather than an optimization?
> 
> Not sure i get it. If a guests wants a 2K^^(2n+1) translation
> size it will instead get a 2K^^2n size, no? Isn't this an issue?

It will get two pages of 2k^^2n size (assuming both halves are accessed).  It 
will work, it will just consume twice as many TLB1 entries.  It's the same as 
if the host pages backing the region are smaller than the mapping that the 
guest tries to create.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e6500: support powers of 2K TLB1 sizes

2015-09-24 Thread Scott Wood
On Thu, 2015-09-24 at 15:57 +0300, Laurentiu Tudor wrote:
> Book-E MMUv2 present in e6500 cores supports
> powers of 2K page sizes while older MMUv1 cores
> support only powers of 4K page sizes, or in other
> words the LSB of TSIZE on MMUv1 is always 0.
> Thus, on MMUv2 we must not strip the LSB.

We can get better TLB utilization by not stripping it, but why "must not" 
which makes it sound like a bugfix rather than an optimization?

> Signed-off-by: Mihai Caraman 
> [laurentiu.tu...@freescale.com: addressed review
>  feedback, split in distinct patch]
> Signed-off-by: Laurentiu Tudor 
> ---
>  arch/powerpc/kvm/e500_mmu_host.c | 28 +---
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
> b/arch/powerpc/kvm/e500_mmu_host.c
> index 4d33e19..12d5c67 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -371,6 +371,7 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>  
>   unsigned long start, end;
>   unsigned long slot_start, slot_end;
> + int tsize_inc;
>  
>   pfnmap = 1;
>  
> @@ -392,10 +393,20 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>   MAS1_TSIZE_SHIFT;
>  
>   /*
> -  * e500 doesn't implement the lowest tsize bit,
> -  * or 1K pages.
> +  * MMUv1 doesn't implement the lowest tsize bit,
> +  * or translations smaller than 4K.
>*/
> - tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);
> + if (!has_feature(_e500->vcpu, VCPU_FTR_MMU_V2))
> + tsize &= ~1;
> + tsize = max(BOOK3E_PAGESZ_4K, tsize);
> +
> + /*
> +  * Calculate TSIZE increment. MMUv2 supports
> +  * power of 2K translations while MMUv1 is limited
> +  * to power of 4K sizes.
> +  */
> + tsize_inc = has_feature(_e500->vcpu,
> + VCPU_FTR_MMU_V2) ? 1 : 2;

If you calculate tsize_inc first, then the previous if-statement can become 
"tsize &= ~(tsize_inc - 1);".

>  
>   /*
>* Now find the largest tsize (up to what the guest
> @@ -404,7 +415,8 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>* aligned.
>*/
>  
> - for (; tsize > BOOK3E_PAGESZ_4K; tsize -= 2) {
> + for (; tsize > BOOK3E_PAGESZ_4K;
> +  tsize -= tsize_inc) {
>   unsigned long gfn_start, gfn_end;
>   tsize_pages = 1 << (tsize - 2);
>  
> @@ -437,10 +449,12 @@ static inline int kvmppc_e500_shadow_map(struct 
> kvmppc_vcpu_e500 *vcpu_e500,
>   tsize = min(__ilog2(psize) - 10, tsize);
>  
>   /*
> -  * e500 doesn't implement the lowest tsize bit,
> -  * or 1K pages.
> +  * MMUv1 doesn't implement the lowest tsize bit,
> +  * or translations smaller than 4K.
>*/

This comment makes it sound like MMUv2 might support translations smaller 
than 4K.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-09-24 Thread Scott Wood
On Thu, 2015-09-24 at 09:56 +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman 
> [laurentiu.tu...@freescale.com: rebased on latest kernel,
>  use define instead of hardcoded value]
> Signed-off-by: Laurentiu Tudor 
> ---
> Needs this patch: https://patchwork.ozlabs.org/patch/521752/
> 
>  arch/powerpc/include/asm/disassemble.h |  5 +
>  arch/powerpc/kvm/e500_emulate.c| 11 +++
>  2 files changed, 16 insertions(+)

KVM patches should be sent to k...@vger.kernel.org in addition to kvm-
p...@vger.kernel.org.

> @@ -165,6 +167,15 @@ int kvmppc_core_emulate_op_e500(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   emulated = kvmppc_e500_emul_tlbivax(vcpu, ea);
>   break;
>  
> + case XOP_MFTMR:
> + /* Expose one thread per vcpu */
> + if (get_tmrn(inst) == TMRN_TMCFG0)
> + kvmppc_set_gpr(vcpu, rt,
> +1 | (1 << 
> TMRN_TMCFG0_NATHRD_SHIFT));
> + else
> + emulated = EMULATE_FAIL;
> + break;

Line length.  Please move the implementation into its own function like all 
the others.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/e6500: add TMCFG0 register definition

2015-09-24 Thread Scott Wood
On Wed, 2015-09-23 at 18:06 +0300, Laurentiu Tudor wrote:
> The register is not currently used in the base kernel
> but will be in a forthcoming kvm patch.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> ---
>  arch/powerpc/include/asm/reg_booke.h | 6 ++
>  1 file changed, 6 insertions(+)

Acked-by: Scott Wood <scottw...@freescale.com>

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result

2015-09-24 Thread Scott Wood
On Thu, 2015-09-24 at 16:00 +0200, Andrzej Hajda wrote:
> The function can return negative value.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda <a.ha...@samsung.com>
> ---
> Hi,
> 
> To avoid problems with too many mail recipients I have sent whole
> patchset only to LKML. Anyway patches have no dependencies.
> 
> Regards
> Andrzej
> ---
>  arch/powerpc/kvm/e500.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: Scott Wood <scottw...@freescale.com>

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: e6500: Handle LRAT error exception

2015-09-24 Thread Scott Wood
On Thu, 2015-09-24 at 16:11 +0300, Laurentiu Tudor wrote:
> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
> b/arch/powerpc/kvm/bookehv_interrupts.S
> index 81bd8a07..1e9fa2a 100644
> --- a/arch/powerpc/kvm/bookehv_interrupts.S
> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
> @@ -62,6 +62,7 @@
>  #define NEED_EMU 0x0001 /* emulation -- save nv regs */
>  #define NEED_DEAR0x0002 /* save faulting DEAR */
>  #define NEED_ESR 0x0004 /* save faulting ESR */
> +#define NEED_LPER0x0008 /* save faulting LPER */
>  
>  /*
>   * On entry:
> @@ -159,6 +160,12 @@
>   PPC_STL r9, VCPU_FAULT_DEAR(r4)
>   .endif
>  
> + /* Only supported on 64-bit cores for now */
> + .if \flags & NEED_LPER
> + mfspr   r7, SPRN_LPER
> + std r7, VCPU_FAULT_LPER(r4)
> + .endif

What's the harm in using PPC_STL anyway?


>  /*
>   * For input register values, see 
> arch/powerpc/include/asm/kvm_booke_hv_asm.h
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
> b/arch/powerpc/kvm/e500_mmu_host.c
> index 12d5c67..99ad88a 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -96,6 +96,112 @@ static inline void __write_host_tlbe(struct 
> kvm_book3e_206_tlb_entry *stlbe,
> stlbe->mas2, stlbe->mas7_3);
>  }
>  
> +#if defined(CONFIG_64BIT) && defined(CONFIG_KVM_BOOKE_HV)
> +static int lrat_next(void)
> +{

Will anything break by removing the CONFIG_64BIT condition, even if we don't 
have a 32-bit target that uses this?

> +void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> + struct kvm_memory_slot *slot;
> + unsigned long pfn;
> + unsigned long hva;
> + struct vm_area_struct *vma;
> + unsigned long psize;
> + int tsize;
> + unsigned long tsize_pages;
> +
> + slot = gfn_to_memslot(vcpu->kvm, gfn);
> + if (!slot) {
> + pr_err_ratelimited("%s: couldn't find memslot for gfn %lx!\n",
> +__func__, (long)gfn);
> + return;
> + }
> +
> + hva = slot->userspace_addr;

What if the faulting address is somewhere in the middle of the slot?  
Shouldn't you use gfn_to_hva_memslot() like kvmppc_e500_shadow_map()?  In 
fact there's probably a lot of logic that should be shared between these two 
functions.

> + down_read(>mm->mmap_sem);
> + vma = find_vma(current->mm, hva);
> + if (vma && (hva >= vma->vm_start)) {
> + psize = vma_kernel_pagesize(vma);

What if it's VM_PFNMAP?

> + } else {
> + pr_err_ratelimited("%s: couldn't find virtual memory address 
> for gfn 
> %lx!\n",
> +__func__, (long)gfn);
> + up_read(>mm->mmap_sem);
> + return;
> + }
> + up_read(>mm->mmap_sem);
> +
> + pfn = gfn_to_pfn_memslot(slot, gfn);
> + if (is_error_noslot_pfn(pfn)) {
> + pr_err_ratelimited("%s: couldn't get real page for gfn %lx!\n",
> +__func__, (long)gfn);
> + return;
> + }
> +
> + tsize = __ilog2(psize) - 10;
> + tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);

1UL << ...

kvmppc_e500_shadow_map needs the same fix.

> + gfn &= ~(tsize_pages - 1);
> + pfn &= ~(tsize_pages - 1);
> +
> + write_host_lrate(tsize, gfn, pfn, vcpu->kvm->arch.lpid, true);
> +
> + kvm_release_pfn_clean(pfn);
> +}
> +
> +void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu)
> +{
> + uint32_t mas0, mas1 = 0;
> + int esel;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> +
> + /* LRAT does not have a dedicated instruction for invalidation */
> + for (esel = 0; esel < get_paca()->tcd_ptr->lrat_max; esel++) {
> + mas0 = MAS0_ATSEL | MAS0_ESEL(esel);
> + mtspr(SPRN_MAS0, mas0);
> + asm volatile("isync; tlbre" : : : "memory");
> + mas1 = mfspr(SPRN_MAS1) & ~MAS1_VALID;
> + mtspr(SPRN_MAS1, mas1);
> + asm volatile("isync; tlbwe" : : : "memory");
> + }
> + /* Must clear mas8 for other host tlbwe's */
> + mtspr(SPRN_MAS8, 0);
> + isync();
> +
> + local_irq_restore(flags);
> +}
> +#endif /* CONFIG_64BIT && CONFIG_KVM_BOOKE_HV */
> +
>  /*
>   * Acquire a mas0 with victim hint, as if we just took a TLB miss.
>   *
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index cda695d..5856f8f 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -99,6 +99,10 @@ void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500 
> *vcpu_e500)
>   asm volatile("tlbilxlpid");
>   mtspr(SPRN_MAS5, 0);
>   local_irq_restore(flags);
> +
> +#ifdef PPC64
> + kvmppc_lrat_invalidate(_e500->vcpu);
> +#endif

Don't you mean CONFIG_PPC64 (or CONFIG_64BIT to be consistent)?

>  }
>  
>  void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid)
> diff --git a/arch/powerpc/mm/fsl_booke_mmu.c 
> 

Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest

2015-09-14 Thread Scott Wood
On Mon, 2015-09-14 at 16:17 +0300, Laurentiu Tudor wrote:
> On 09/10/2015 02:01 AM, Scott Wood wrote:
> > On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote:
> > > This way we get rid of an entire file with mostly
> > > duplicated code plus a Kconfig option that you always
> > > had to take care to check it in order for kvm to work.
> > > 
> > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> > > ---
> > >  arch/powerpc/platforms/85xx/Kconfig   | 15 -
> > >  arch/powerpc/platforms/85xx/Makefile  |  1 -
> > >  arch/powerpc/platforms/85xx/corenet_generic.c |  1 +
> > >  arch/powerpc/platforms/85xx/qemu_e500.c   | 85 
> > > 
> > 
> > 
> > qemu_e500 is not only for corenet chips.  
> 
> That's too bad. :-(
> I remember discussions on dropping the e500v2 support at some point in time?
> 
> > We can add it to the defconfig (in fact I've been meaning to do so).
> 
> Or maybe just drop de KConfig option and
> wrap the file in an #ifdef CONFIG_KVM or something along these lines?
>  
> > > -static void __init qemu_e500_setup_arch(void)
> > > -{
> > > - ppc_md.progress("qemu_e500_setup_arch()", 0);
> > > -
> > > - fsl_pci_assign_primary();
> > > - swiotlb_detect_4g();
> > 
> > Where is fsl_pci_assign_primary() in corenet_generic.c?  
> 
> This commit claims it's not needed and drops it:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7d4d595dad30328bc6153e235d90f54c918fc127

That commit has nothing to do with QEMU.

> > At one point this 
> > was needed for QEMU's PCI implementation -- have you tested QEMU PCI 
> > without 
> > it?
> 
> Well, somehow i've (embarrassingly) messed up my initial tests.
> I've retested after seeing your comment and indeed this breaks pci under 
> qemu.
> Adding to the confusion, the commit above made me think that the removal 
> was safe.
> Why pci qemu doesn't work without the call to fsl_pci_assign_primary() is
> an interesting subject.

IIRC it has to do with QEMU not liking a BAR set to zero.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest

2015-09-14 Thread Scott Wood
On Mon, 2015-09-14 at 16:14 +0200, Alexander Graf wrote:
> > Am 14.09.2015 um 15:17 schrieb Laurentiu Tudor <b10...@freescale.com>:
> > 
> > > On 09/10/2015 02:01 AM, Scott Wood wrote:
> > > > On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote:
> > > > This way we get rid of an entire file with mostly
> > > > duplicated code plus a Kconfig option that you always
> > > > had to take care to check it in order for kvm to work.
> > > > 
> > > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> > > > ---
> > > > arch/powerpc/platforms/85xx/Kconfig   | 15 -
> > > > arch/powerpc/platforms/85xx/Makefile  |  1 -
> > > > arch/powerpc/platforms/85xx/corenet_generic.c |  1 +
> > > > arch/powerpc/platforms/85xx/qemu_e500.c   | 85 ---
> > > > -
> > > 
> > > 
> > > qemu_e500 is not only for corenet chips.  
> > 
> > That's too bad. :-(
> > I remember discussions on dropping the e500v2 support at some point in 
> > time?
> > 
> > > We can add it to the defconfig (in fact I've been meaning to do so).
> > 
> > Or maybe just drop de KConfig option and
> > wrap the file in an #ifdef CONFIG_KVM or something along these lines?
> 
> CONFIG_KVM is for host support though. This is for the guest kernel.

CONFIG_QEMU_E500 can also be used with TCG -- it's not KVM-specific.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/e500: move qemu machine spec together with the rest

2015-09-09 Thread Scott Wood
On Fri, 2015-09-04 at 15:46 +0300, Laurentiu Tudor wrote:
> This way we get rid of an entire file with mostly
> duplicated code plus a Kconfig option that you always
> had to take care to check it in order for kvm to work.
> 
> Signed-off-by: Laurentiu Tudor 
> ---
>  arch/powerpc/platforms/85xx/Kconfig   | 15 -
>  arch/powerpc/platforms/85xx/Makefile  |  1 -
>  arch/powerpc/platforms/85xx/corenet_generic.c |  1 +
>  arch/powerpc/platforms/85xx/qemu_e500.c   | 85 


qemu_e500 is not only for corenet chips.  We can add it to the defconfig (in 
fact I've been meaning to do so).

> -static void __init qemu_e500_setup_arch(void)
> -{
> - ppc_md.progress("qemu_e500_setup_arch()", 0);
> -
> - fsl_pci_assign_primary();
> - swiotlb_detect_4g();

Where is fsl_pci_assign_primary() in corenet_generic.c?  At one point this 
was needed for QEMU's PCI implementation -- have you tested QEMU PCI without 
it?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/2] perf,kvm/ppc: Add kvm_perf.h for powerpc

2015-07-31 Thread Scott Wood
[Added KVM lists and a couple relevant people]

On Fri, 2015-07-31 at 14:25 +0530, Hemant Kumar wrote:
 On 07/30/2015 03:52 AM, Scott Wood wrote:
  On Wed, 2015-07-29 at 16:07 +0530, Hemant Kumar wrote:
   Hi Scott,
   
   On 07/17/2015 01:40 AM, Scott Wood wrote:
On Thu, 2015-07-16 at 21:18 +0530, Hemant Kumar wrote:
 To analyze the exit events with perf, we need kvm_perf.h to be 
 added in
 the arch/powerpc directory, where the kvm tracepoints needed to 
 trace
 the KVM exit events are defined.
 
 This patch adds kvm_perf_book3s.h to indicate that the 
 tracepoints are
 book3s specific. Generic kvm_perf.h then can just include
 kvm_perf_book3s.h.
 
 Signed-off-by: Hemant Kumar hem...@linux.vnet.ibm.com
 ---
 Changes:
 - Not exporting the exit reasons compared to previous patchset
 (suggested
 by Paul)
 
arch/powerpc/include/uapi/asm/kvm_perf.h|  6 ++
arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 14 
 ++
2 files changed, 20 insertions(+)
create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf.h
create mode 100644 
 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
 
 diff --git a/arch/powerpc/include/uapi/asm/kvm_perf.h
 b/arch/powerpc/include/uapi/asm/kvm_perf.h
 new file mode 100644
 index 000..5ed2ff3
 --- /dev/null
 +++ b/arch/powerpc/include/uapi/asm/kvm_perf.h
 @@ -0,0 +1,6 @@
 +#ifndef _ASM_POWERPC_KVM_PERF_H
 +#define _ASM_POWERPC_KVM_PERF_H
 +
 +#include asm/kvm_perf_book3s.h
 +
 +#endif
 diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
 b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
 new file mode 100644
 index 000..8c8d8c2
 --- /dev/null
 +++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
 @@ -0,0 +1,14 @@
 +#ifndef _ASM_POWERPC_KVM_PERF_BOOK3S_H
 +#define _ASM_POWERPC_KVM_PERF_BOOK3S_H
 +
 +#include asm/kvm.h
 +
 +#define DECODE_STR_LEN 20
 +
 +#define VCPU_ID vcpu_id
 +
 +#define KVM_ENTRY_TRACE kvm_hv:kvm_guest_enter
 +#define KVM_EXIT_TRACE kvm_hv:kvm_guest_exit
 +#define KVM_EXIT_REASON trap
 +
 +#endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */
Again, why is book3s stuff being presented via uapi as generic
asm/kvm_perf.h with generic symbol names?

-Scott
   Ok.
   
   We can change the KVM_ENTRY_TRACE macro to something like
   KVM_BOOK3S_ENTRY_TRACE and likewise for KVM_EXIT_TRACE
   and KVM_EXIT_REASON
  What about DECODE_STR_LEN and VCPU_ID?
 
 DECODE_STR_LEN can be common, we can give a big enough size to it, if
 we need to.
 And, VCPU_ID depends on the field in the tracepoint payload data which is
 specific to that tracepoint. This field is used to maintain the per vcpu 
 record
 and this field gives us the vcpu id. So, yeah, I guess, since, I can't 
 find any
 such field as vcpu_id in the kvm_exit tracepoint for book3e, we have to
 make this specific to book3s.

Or maybe we could add kvm_guest_enter/kvm_guest_leave, with vcpu_id, to 
book3e... though the kvm-hv would be a problem for book3s-pr, if anyone cares 
about this feature there.

I'm not sure why the strings are present both in the UAPI header, as well as 
in kvm_events_tp[] in kvm-stat.c.

  Where is this API documented?
  
 and then, to resolve the issue of generic
   macro names in the userspace side, we can handle it using __weak
   modifier.
  Does userspace get built differently for book3s versus book3e?  For now 
  it'd
  
  be fine for userspace to check for book3s and not use the feature if it's
  
  book3e.  If and when book3e gains this feature, then userspace can be 
  changed.
 
 Well, I couldn't find any way to build user space differently for book3s and
 book3e.
 
 How about keeping this as it is after modifying the tracepoint macro names
 to book3s specific in the uapi? And as and when booke decides to implement
 this feature, a runtime check for event availability can be added then, 
 IMHO.
 
 What do you think?

What does userspace use, at runtime, to determine if this feature is present 
and whether the book3s symbols should be used?

Deferring the implementation of book3e support is fine, but from a uapi 
perspective it should be discoverable at runtime whether the feature exposed 
by asm/kvm_perf_book3s.h is available.  Otherwise, if it is implemented (or 
even if it isn't), you have the potential for user confusion if an older perf 
tool is used.  This sort of discovery is done all the time in the KVM APIs 
themselves.

FWIW, on x86 cpu_isa_init() uses cpuid to select an exit table.  Using PVR 
seems like it would be a bit cumbersome though compared to the kernel 
directly exporting which subarch it is (and it wouldn't help with HV versus 
PR).

   What would you suggest?
  Another option would be to explain this interface so that we can figure 
  out
  if book3e would even

Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-07 Thread Scott Wood
On Tue, 2015-07-07 at 16:05 +0800, wenwei tao wrote:
 Hi Scott
 
 I understand what you said.
 
 I will use the function 'is_vm_hugetlb_page()' to hide the bit
 combinations according to your comments in the next version of patch
 set.
 
 But for the situation like below, there isn't an obvious structure
 'vma', using 'is_vm_hugetlb_page()' maybe costly or even not possible.
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 unsigned long end, unsigned long vmflag)
 {
 ...
 
 if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1
 || vmflag  VM_HUGETLB) {
 local_flush_tlb();
 goto flush_all;
 }
 ...
 }

Add a function that operates on the flags directly, then.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-06 Thread Scott Wood
On Fri, 2015-07-03 at 16:47 +0800, wenwei tao wrote:
 Hi Scott
 
 Thank you for your comments.
 
 Kernel already has that function: is_vm_hugetlb_page() , but the
 original code didn't use it,
 in order to keep the coding style of the original code, I didn't use it 
 either.

 For the sentence like: vma-vm_flags  VM_HUGETLB , hiding it behind
 'is_vm_hugetlb_page()' is ok,
 but the sentence like: vma-vm_flags 
 (VM_LOCKED|VM_HUGETLB|VM_PFNMAP) appears in the patch 2/6,
 is it better to hide the bit combinations behind the
 is_vm_hugetlb_page() ?  In my patch I just replaced it with
 vma-vm_flags  (VM_LOCKED|VM_PFNMAP) ||  (vma-vm_flags 
 (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB.

If you're going to do non-obvious things with the flags, it should be done in 
one place rather than throughout the code.  Why would you do the above and 
not vma-vm_flags  (VM_LOCKED | VM_PFNMAP) || is_vm_hugetlb_page(vma)?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

2015-07-02 Thread Scott Wood
On Wed, 2015-06-10 at 14:27 +0800, Wenwei Tao wrote:
 Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB 
 and
 VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
 mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a 
 hugetlb
 VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Eww.

If you must overload such bit combinations, please hide it behind a 
vm_is_hugetlb() function.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] KVM: PPC: fix suspicious use of conditional operator

2015-05-22 Thread Scott Wood
On Fri, 2015-05-22 at 17:46 +0300, Laurentiu Tudor wrote:
 This was signaled by a static code analysis tool.
 
 Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com
 ---
  arch/powerpc/kvm/e500_mmu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
 index 50860e9..29911a0 100644
 --- a/arch/powerpc/kvm/e500_mmu.c
 +++ b/arch/powerpc/kvm/e500_mmu.c
 @@ -377,7 +377,7 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t 
 ea)
   | MAS0_NV(vcpu_e500-gtlb_nv[tlbsel]);
   vcpu-arch.shared-mas1 =
 (vcpu-arch.shared-mas6  MAS6_SPID0)
 - | (vcpu-arch.shared-mas6  (MAS6_SAS ? MAS1_TS : 0))
 + | ((vcpu-arch.shared-mas6  MAS6_SAS) ? MAS1_TS : 0)
   | (vcpu-arch.shared-mas4  MAS4_TSIZED(~0));
   vcpu-arch.shared-mas2 = MAS2_EPN;
   vcpu-arch.shared-mas2 |= vcpu-arch.shared-mas4 

Reviewed-by: Scott Wood scottw...@freescale.com

Please send as non-RFC.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL

2015-05-21 Thread Scott Wood
On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote:
 If passed a larger page size lookup_linux_ptep()
 may fail, so add a check for that and bail out
 if that's the case.
 This was found with the help of a static
 code analysis tool.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 ---
 based on https://github.com/agraf/linux-2.6.git kvm-ppc-next
 
  arch/powerpc/kvm/e500_mmu_host.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-20 Thread Scott Wood
On Wed, 2015-05-20 at 15:26 +1000, Sam Bobroff wrote:
 In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
 accessed as such.
 
 This patch corrects places where it is accessed as a 32 bit field by a
 64 bit kernel.  In some cases this is via a 32 bit load or store
 instruction which, depending on endianness, will cause either the
 lower or upper 32 bits to be missed.  In another case it is cast as a
 u32, causing the upper 32 bits to be cleared.
 
 This patch corrects those places by extending the access methods to
 64 bits.
 
 Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
 ---
 
  arch/powerpc/include/asm/kvm_book3s.h   |4 ++--
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++---
  arch/powerpc/kvm/book3s_segment.S   |4 ++--
  3 files changed, 7 insertions(+), 7 deletions(-)

It's nominally a 64-bit register, but the upper 32 bits are reserved in
ISA 2.06.  Do newer ISAs or certain implementations define things in the
upper 32 bits, or is this just about the asm accesses being wrong on
big-endian?

Also, I see the same issue in the booke code (CCing Mihai).

-Scott

 diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
 b/arch/powerpc/include/asm/kvm_book3s.h
 index b91e74a..05a875a 100644
 --- a/arch/powerpc/include/asm/kvm_book3s.h
 +++ b/arch/powerpc/include/asm/kvm_book3s.h
 @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
   return vcpu-arch.cr;
  }
  
 -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
 +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
  {
   vcpu-arch.xer = val;
  }
  
 -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
  {
   return vcpu-arch.xer;
  }
 diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
 b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 index 4d70df2..d75be59 100644
 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
   blt hdec_soon
  
   ld  r6, VCPU_CTR(r4)
 - lwz r7, VCPU_XER(r4)
 + ld  r7, VCPU_XER(r4)
  
   mtctr   r6
   mtxer   r7
 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
   mfctr   r3
   mfxer   r4
   std r3, VCPU_CTR(r9)
 - stw r4, VCPU_XER(r9)
 + std r4, VCPU_XER(r9)
  
   /* If this is a page table miss then see if it's theirs or ours */
   cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
 @@ -1675,7 +1675,7 @@ kvmppc_hdsi:
   bl  kvmppc_msr_interrupt
  fast_interrupt_c_return:
  6:   ld  r7, VCPU_CTR(r9)
 - lwz r8, VCPU_XER(r9)
 + ld  r8, VCPU_XER(r9)
   mtctr   r7
   mtxer   r8
   mr  r4, r9
 diff --git a/arch/powerpc/kvm/book3s_segment.S 
 b/arch/powerpc/kvm/book3s_segment.S
 index acee37c..ca8f174 100644
 --- a/arch/powerpc/kvm/book3s_segment.S
 +++ b/arch/powerpc/kvm/book3s_segment.S
 @@ -123,7 +123,7 @@ no_dcbz32_on:
   PPC_LL  r8, SVCPU_CTR(r3)
   PPC_LL  r9, SVCPU_LR(r3)
   lwz r10, SVCPU_CR(r3)
 - lwz r11, SVCPU_XER(r3)
 + PPC_LL  r11, SVCPU_XER(r3)
  
   mtctr   r8
   mtlrr9
 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
   mfctr   r8
   mflrr9
  
 - stw r5, SVCPU_XER(r13)
 + PPC_STL r5, SVCPU_XER(r13)
   PPC_STL r6, SVCPU_FAULT_DAR(r13)
   stw r7, SVCPU_FAULT_DSISR(r13)
   PPC_STL r8, SVCPU_CTR(r13)


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock

2014-09-12 Thread Scott Wood
On Fri, 2014-09-12 at 09:12 -0500, Purcareata Bogdan-B43198 wrote:
  -Original Message-
  From: Wood Scott-B07421
  Sent: Thursday, September 11, 2014 9:19 PM
  To: Purcareata Bogdan-B43198
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org
  Subject: Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock
  
  On Thu, 2014-09-11 at 15:25 -0400, Bogdan Purcareata wrote:
   This patch enables running intensive I/O workloads, e.g. netperf, in a 
   guest
   deployed on a RT host. No change for !RT kernels.
  
   The openpic spinlock becomes a sleeping mutex on a RT system. This no 
   longer
   guarantees that EPR is atomic with exception delivery. The guest VCPU 
   thread
   fails due to a BUG_ON(preemptible()) when running netperf.
  
   In order to make the kvmppc_mpic_set_epr() call safe on RT from non-atomic
   context, convert the openpic lock to a raw_spinlock. A similar approach 
   can
   be seen for x86 platforms in the following commit [1].
  
   Here are some comparative cyclitest measurements run inside a high 
   priority
  RT
   guest run on a RT host. The guest has 1 VCPU and the test has been run for
  15
   minutes. The guest runs ~750 hackbench processes as background stress.
  
  Does hackbench involve triggering interrupts that would go through the
  MPIC?  You may want to try an I/O-heavy benchmark to stress the MPIC
  code (the more interrupt sources are active at once, the better).
 
 Before this patch, running netperf/iperf in the guest always resulted
 in hitting the afore-mentioned BUG_ON, when the host was RT. This is
 why I can't provide comparative cyclitest measurements before and after
 the patch, with heavy I/O stress. Since I had no problem running
 hackbench before, I'm assuming it doesn't involve interrupts passing
 through the MPIC. The measurements were posted just to show that the
 patch doesn't mess up anything somewhere else.

I know you can't provide before/after, but it would be nice to see what
the after numbers are with heavy MPIC activity.

  Also try a guest with many vcpus.
 
 AFAIK, without the MSI affinity patches [1], all vfio interrupts will
 go to core 0 in the guest. In this case, I guess there won't be
 contention induced latencies due to multiple VCPUs expecting to have
 their interrupts delivered. Am I getting it wrong?

It's not about contention, but about loops in the MPIC code that iterate
over the entire set of vcpus.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Convert openpic lock to raw_spinlock

2014-09-11 Thread Scott Wood
On Thu, 2014-09-11 at 15:25 -0400, Bogdan Purcareata wrote:
 This patch enables running intensive I/O workloads, e.g. netperf, in a guest
 deployed on a RT host. No change for !RT kernels.
 
 The openpic spinlock becomes a sleeping mutex on a RT system. This no longer
 guarantees that EPR is atomic with exception delivery. The guest VCPU thread
 fails due to a BUG_ON(preemptible()) when running netperf.
 
 In order to make the kvmppc_mpic_set_epr() call safe on RT from non-atomic
 context, convert the openpic lock to a raw_spinlock. A similar approach can
 be seen for x86 platforms in the following commit [1].
 
 Here are some comparative cyclitest measurements run inside a high priority RT
 guest run on a RT host. The guest has 1 VCPU and the test has been run for 15
 minutes. The guest runs ~750 hackbench processes as background stress.

Does hackbench involve triggering interrupts that would go through the
MPIC?  You may want to try an I/O-heavy benchmark to stress the MPIC
code (the more interrupt sources are active at once, the better).
Also try a guest with many vcpus.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-27 Thread Scott Wood
On Wed, 2014-08-27 at 13:23 +0200, Alexander Graf wrote:
 
 On 13.08.14 11:09, Bharat Bhushan wrote:
  This patch emulates debug registers and debug exception
  to support guest using debug resource. This enables running
  gdb/kgdb etc in guest.
  
  On BOOKE architecture we cannot share debug resources between QEMU and
  guest because:
  When QEMU is using debug resources then debug exception must
  be always enabled. To achieve this we set MSR_DE and also set
  MSRP_DEP so guest cannot change MSR_DE.
  
  When emulating debug resource for guest we want guest
  to control MSR_DE (enable/disable debug interrupt on need).
  
  So above mentioned two configuration cannot be supported
  at the same time. So the result is that we cannot share
  debug resources between QEMU and Guest on BOOKE architecture.
  
  In the current design QEMU gets priority over guest, this means that if
  QEMU is using debug resources then guest cannot use them and if guest is
  using debug resource then QEMU can overwrite them.
  
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 
 Scott, could you please recheck whether you're ok with it now? :)

I'm OK with it.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores

2014-08-20 Thread Scott Wood
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote:
 SPE exception handlers are now defined for 32-bit e500mc cores even though
 SPE unit is not present and CONFIG_SPE is undefined.
 
 Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE
 and consequently guard __stup_ivors and __setup_cpu functions.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 Cc: Alexander Graf ag...@suse.de
 ---
 v2:
  - use CONFIG_PPC_E500MC without CONFIG_E500
  - use elif defined()
 
  arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++-
  arch/powerpc/kernel/cputable.c|  5 +
  arch/powerpc/kernel/head_fsl_booke.S  | 18 +-
  arch/powerpc/platforms/Kconfig.cputype|  6 +-
  4 files changed, 34 insertions(+), 7 deletions(-)

Acked-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers

2014-08-20 Thread Scott Wood
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote:
 Book3E specification defines shared interrupt numbers for SPE and AltiVec
 units. Still SPE is present in e200/e500v2 cores while AltiVec is present in
 e6500 core. So we can currently decide at compile-time which unit to support
 exclusively. As Alexander Graf suggested, this will improve code readability
 especially in KVM.
 
 Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting
 c58ce397 and 6b310fc5 patches that added common defines.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 Cc: Alexander Graf ag...@suse.de
 ---
  arch/powerpc/kernel/exceptions-64e.S | 4 ++--
  arch/powerpc/kernel/head_fsl_booke.S | 8 
  2 files changed, 6 insertions(+), 6 deletions(-)

Acked-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-12 Thread Scott Wood
On Tue, 2014-08-12 at 02:36 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, August 12, 2014 5:30 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and
  exception
  
  On Wed, 2014-08-06 at 12:08 +0530, Bharat Bhushan wrote:
   @@ -1249,6 +1284,7 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
 setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
 (unsigned long)vcpu);
  
   + kvmppc_clear_dbsr();
 return 0;
  
  This could use a comment for why we're doing this.  Also, I'm a bit uneasy 
  about
  clearing the whole DBSR here, where we haven't yet switched the debug 
  registers
  to guest context.
 
 I think we wanted MRR to not cause debug event to guest, So should we only 
 clear MRR ?
 
  It shouldn't actually matter except for deferred debug
  exceptions which are not actually useful (in fact e6500 removed support for
  them),
 
 Exactly, that's why I was clearing complete DBSR. Probably we can have a 
 comment
  Do not let previously set debug events visible to guest. As deferred debug 
 events
   are not supported, so it is ok to clear complete DBSR.
  

This would be affecting host debugging of the host, not guest debugging
of the guest.  Still I don't think it's a huge deal, but clearing only
MRR would be cleaner.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores

2014-08-11 Thread Scott Wood
On Wed, 2014-08-06 at 11:39 +0300, Mihai Caraman wrote:
 SPE exception handlers are now defined for 32-bit e500mc cores even though
 SPE unit is not present and CONFIG_SPE is undefined.
 
 Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE
 and consequently guard __stup_ivors and __setup_cpu functions.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++-
  arch/powerpc/kernel/cputable.c|  5 +
  arch/powerpc/kernel/head_fsl_booke.S  | 18 +-
  arch/powerpc/platforms/Kconfig.cputype|  6 +-
  4 files changed, 34 insertions(+), 7 deletions(-)
 
 diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S 
 b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
 index 4f1393d..44bb2c9 100644
 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
 +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
 @@ -91,6 +91,7 @@ _GLOBAL(setup_altivec_idle)
  
   blr
  
 +#if defined(CONFIG_E500)  defined(CONFIG_PPC_E500MC)

When would you ever have CONFIG_PPC_E500MC without CONFIG_E500?

 diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
 b/arch/powerpc/kernel/head_fsl_booke.S
 index b497188..4f8930f 100644
 --- a/arch/powerpc/kernel/head_fsl_booke.S
 +++ b/arch/powerpc/kernel/head_fsl_booke.S
 @@ -613,6 +613,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
   mfspr   r10, SPRN_SPRG_RSCRATCH0
   b   InstructionStorage
  
 +/* Define SPE handlers for e200 and e500v2 */
  #ifdef CONFIG_SPE
   /* SPE Unavailable */
   START_EXCEPTION(SPEUnavailable)
 @@ -622,10 +623,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
   b   fast_exception_return
  1:   addir3,r1,STACK_FRAME_OVERHEAD
   EXC_XFER_EE_LITE(0x2010, KernelSPE)
 -#else
 +#elif CONFIG_SPE_POSSIBLE

#elif defined(CONFIG_SPE_POSSIBLE)

Likewise elsewhere

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-08-11 Thread Scott Wood
On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote:
 @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct 
 kvm_vcpu *vcpu)
  
  static int kvmppc_core_init_vm_e500mc(struct kvm *kvm)
  {
 - int lpid;
 + int i, lpid;
  
 - lpid = kvmppc_alloc_lpid();
 - if (lpid  0)
 - return lpid;
 + /* The lpid pool supports only 2 entries now */
 + if (threads_per_core  2)
 + return -ENOMEM;
 +
 + /* Each VM allocates one LPID per HW thread index */
 + for (i = 0; i  threads_per_core; i++) {
 + lpid = kvmppc_alloc_lpid();
 + if (lpid  0)
 + return lpid;
 +
 + kvm-arch.lpid_pool[i] = lpid;
 + }

Wouldn't it be simpler to halve the size of the lpid pool that the
allocator sees, and just OR in the high bit based on the low bit of the
cpu number?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-08-11 Thread Scott Wood
On Tue, 2014-08-12 at 01:53 +0200, Alexander Graf wrote:
 
  Am 12.08.2014 um 01:36 schrieb Scott Wood scottw...@freescale.com:
  
  On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote:
  @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct 
  kvm_vcpu *vcpu)
  
  static int kvmppc_core_init_vm_e500mc(struct kvm *kvm)
  {
  -int lpid;
  +int i, lpid;
  
  -lpid = kvmppc_alloc_lpid();
  -if (lpid  0)
  -return lpid;
  +/* The lpid pool supports only 2 entries now */
  +if (threads_per_core  2)
  +return -ENOMEM;
  +
  +/* Each VM allocates one LPID per HW thread index */
  +for (i = 0; i  threads_per_core; i++) {
  +lpid = kvmppc_alloc_lpid();
  +if (lpid  0)
  +return lpid;
  +
  +kvm-arch.lpid_pool[i] = lpid;
  +}
  
  Wouldn't it be simpler to halve the size of the lpid pool that the
  allocator sees, and just OR in the high bit based on the low bit of the
  cpu number?
 
 Heh, I wrote the same and then removed the section from my reply again. It 
 wouldn't really make that much of a difference if you think it through 
 completely.
 
 But yes, it certainly would be quite a bit more natural. I'm ok either way.

It's not a huge difference, but it would at least get rid of some of the
ifdeffing in the headers.  It'd also be nicer when debugging to have the
LPIDs correlated.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-11 Thread Scott Wood
On Wed, 2014-08-06 at 12:08 +0530, Bharat Bhushan wrote:
 @@ -1249,6 +1284,7 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
   setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
   (unsigned long)vcpu);
  
 + kvmppc_clear_dbsr();
   return 0;

This could use a comment for why we're doing this.  Also, I'm a bit
uneasy about clearing the whole DBSR here, where we haven't yet switched
the debug registers to guest context.  It shouldn't actually matter
except for deferred debug exceptions which are not actually useful (in
fact e6500 removed support for them), but still...

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5 v2] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-05 Thread Scott Wood
On Mon, 2014-08-04 at 22:41 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, August 05, 2014 4:23 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 5/5 v2] KVM: PPC: BOOKE: Emulate debug registers and
  exception
  
  On Mon, 2014-08-04 at 13:32 +0530, Bharat Bhushan wrote:
   @@ -735,7 +745,27 @@ static int kvmppc_handle_debug(struct kvm_run *run,
  struct kvm_vcpu *vcpu)
 struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg);
 u32 dbsr = vcpu-arch.dbsr;
  
   - /* Clear guest dbsr (vcpu-arch.dbsr).
   + if (vcpu-guest_debug == 0) {
   + /*
   +  * Debug resources belong to Guest.
   +  * Imprecise debug event are not injected
   +  */
   + if (dbsr  DBSR_IDE)
   + return RESUME_GUEST;
  
  This is incorrect.  DBSR_IDE shouldn't *cause* an injection, but it 
  shouldn't
  inhibit it either.
 
 Will this work ?
   If ((dbsr  DBSR_IDE)  !(dbsr  ~DBSR_IDE))
   Return RESUME_GUEST; 

I suppose it could, but it would be cleaner to just change dbsr to
(dbsr  ~DBSR_IDE) in the next if-statement (maybe factoring out each
 term of that if-statement to variables to make it more readable).

   @@ -828,6 +858,8 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu
  *vcpu,
 case BOOKE_INTERRUPT_DEBUG:
 /* Save DBSR before preemption is enabled */
 vcpu-arch.dbsr = mfspr(SPRN_DBSR);
   + /* MASK out DBSR_MRR */
   + vcpu-arch.dbsr = ~DBSR_MRR;
 kvmppc_clear_dbsr();
 break;
 }
  
  DBSR[MRR] can only be set once per host system reset.  There's no need to 
  filter
  it out here; just make sure the host clears it at some point before this 
  point.
 
 Can you please suggest where ? somewhere in KVM initialization ?

Sure, KVM init works given that there's no real reason for non-KVM code
to care.

  The MRR value doesn't currently survive past kvmppc_clear_dbsr(), so this 
  isn't
  helping to preserve it for the host's benefit...
  
   @@ -1858,6 +1890,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct
   kvm_vcpu *vcpu,
  
 if (!(dbg-control  KVM_GUESTDBG_ENABLE)) {
 vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
   + vcpu-arch.dbg_reg.dbcr0 = 0;
  
  Again, it's not clear why we need shadow debug registers here.  Just in 
  case we
  implement something that can't be implemented isn't a good reason to keep
  complexity around.
 
 One reason was that setting EDM in guest visible register, For this we
 need shadow_reg is used to save/restore state in h/w register (which
 does not have DBCR0_EDM) but debug_reg have DBCR0_EDM.

If that's the only reason, then I'd get rid of the shadow and just OR in
DCBR0_EDM when reading the register, if vcpu-guest_debug is nonzero.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG

2014-08-04 Thread Scott Wood
On Mon, 2014-08-04 at 13:22 +0530, Bharat Bhushan wrote:
 Dbsr is not visible to userspace and we do not think any need to
 expose this to userspace because:
   Userspace cannot inject debug interrupt to guest (as this
   does not know guest ability to handle debug interrupt), so
   userspace will always clear DBSR.
   Now if userspace has to always clear DBSR in KVM_EXIT_DEBUG
   handling then clearing dbsr in kernel looks simple as this
   avoid doing SET_SREGS/set_one_reg() to clear DBSR
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
  arch/powerpc/kvm/booke.c | 11 +++
  1 file changed, 11 insertions(+)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 322da7d..5c2e26a 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -735,6 +735,17 @@ static int kvmppc_handle_debug(struct kvm_run *run, 
 struct kvm_vcpu *vcpu)
   struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg);
   u32 dbsr = vcpu-arch.dbsr;
  
 + /* Clear guest dbsr (vcpu-arch.dbsr).
 +  * dbsr is not visible to userspace and we do not think any
 +  * need to expose this to userspace because:
 +  * Userspace cannot inject debug interrupt to guest (as this does
 +  * not know guest ability to handle debug interrupt), so userspace
 +  * will always clear DBSR.
 +  * Now if userspace has to always clear DBSR in KVM_EXIT_DEBUG
 +  * handling then clearing here looks simple as this
 +  * avoid doing SET_SREGS/set_one_reg() to clear DBSR
 +  */
 + vcpu-arch.dbsr = 0;
   run-debug.arch.status = 0;
   run-debug.arch.address = vcpu-arch.pc;
  

I think the changelog is adequate -- I don't think we need to be so
verbose in the code itself.  The question was just whether this was a
userspace-visible change, and it isn't.

FWIW, I think dbsr should be visible to userspace in general (regardless
of whether it's cleared here), because all guest registers should be
visible to userspace.  I may be debugging a guest through means that
don't require owning debug resources, such as stopping and inspecting a
guest that has hung or crashed.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5 v2] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-04 Thread Scott Wood
On Mon, 2014-08-04 at 13:32 +0530, Bharat Bhushan wrote:
 @@ -735,7 +745,27 @@ static int kvmppc_handle_debug(struct kvm_run *run, 
 struct kvm_vcpu *vcpu)
   struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg);
   u32 dbsr = vcpu-arch.dbsr;
  
 - /* Clear guest dbsr (vcpu-arch.dbsr).
 + if (vcpu-guest_debug == 0) {
 + /*
 +  * Debug resources belong to Guest.
 +  * Imprecise debug event are not injected
 +  */
 + if (dbsr  DBSR_IDE)
 + return RESUME_GUEST;

This is incorrect.  DBSR_IDE shouldn't *cause* an injection, but it
shouldn't inhibit it either.

 @@ -828,6 +858,8 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu 
 *vcpu,
   case BOOKE_INTERRUPT_DEBUG:
   /* Save DBSR before preemption is enabled */
   vcpu-arch.dbsr = mfspr(SPRN_DBSR);
 + /* MASK out DBSR_MRR */
 + vcpu-arch.dbsr = ~DBSR_MRR;
   kvmppc_clear_dbsr();
   break;
   }

DBSR[MRR] can only be set once per host system reset.  There's no need
to filter it out here; just make sure the host clears it at some point
before this point.  The MRR value doesn't currently survive past
kvmppc_clear_dbsr(), so this isn't helping to preserve it for the host's
benefit...

 @@ -1858,6 +1890,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
  
   if (!(dbg-control  KVM_GUESTDBG_ENABLE)) {
   vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
 + vcpu-arch.dbg_reg.dbcr0 = 0;

Again, it's not clear why we need shadow debug registers here.  Just in
case we implement something that can't be implemented isn't a good
reason to keep complexity around.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG

2014-08-04 Thread Scott Wood
On Mon, 2014-08-04 at 22:33 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, August 05, 2014 4:17 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 4/5] KVM: PPC: BOOKE: Clear guest dbsr in userspace exit
  KVM_EXIT_DEBUG
  
  FWIW, I think dbsr should be visible to userspace in general (regardless of
  whether it's cleared here), because all guest registers should be visible to
  userspace.  I may be debugging a guest through means that don't require 
  owning
  debug resources, such as stopping and inspecting a guest that has hung or
  crashed.
 
 Do you mean that we should also add a one-reg interface for DBSR ?

Yes.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-01 Thread Scott Wood
On Fri, 2014-08-01 at 04:34 -0500, Bhushan Bharat-R65777 wrote:
 on dbsr write emulation, deque the debug interrupt even if DBSR_IDE is set.
 
 case SPRN_DBSR:
 
 vcpu-arch.dbsr = ~spr_val;
 if (!(vcpu-arch.dbsr  ~DBSR_IDE))
 kvmppc_core_dequeue_debug(vcpu);
 break;
 
 or
 vcpu-arch.dbsr = ~(spr_val | DBSR_IDE);
 if (!vcpu-arch.dbsr)
 kvmppc_core_dequeue_debug(vcpu);
 break;

The first option.  I see no reason to have KVM forcibly clear DBSR[IDE].

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-31 Thread Scott Wood
On Thu, 2014-07-31 at 01:15 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Thursday, July 31, 2014 8:18 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and 
  exception
  
  On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote:
  
-Original Message-
From: Wood Scott-B07421
Sent: Tuesday, July 29, 2014 3:58 AM
To: Bhushan Bharat-R65777
Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
Yoder Stuart-
B08248
Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers
and exception
   
 Userspace might be interested in
the raw value,
  
   With the current design, If userspace is interested then it will not
   get the DBSR.
  
  Oh, because DBSR isn't currently implemented in sregs or one reg?
 
 That is one reason. Another is that if we give dbsr visibility to
 userspace then userspace have to clear dbsr in handling KVM_EXIT_DEBUG.

Right -- since I didn't realize DBSR wasn't already exposed, I thought
userspace already had this responsibility.

  It looked like it was removing dbsr visibility and the requirement for 
  userspace
  to clear dbsr.  I guess the old way was that the value in
  vcpu-arch.dbsr didn't matter until the next debug exception, when it
  would be overwritten by the new SPRN_DBSR?
 
 But that means old dbsr will be visibility to userspace, which is even bad 
 than not visible, no?
 
 Also this can lead to old dbsr visible to guest once userspace releases
 debug resources, but this can be solved by clearing dbsr in
 kvm_arch_vcpu_ioctl_set_guest_debug() -  if (!(dbg-control 
 KVM_GUESTDBG_ENABLE)) { }.

I wasn't suggesting that you keep it that way, just clarifying my
understanding of the current code.


 
  
 + case SPRN_DBCR2:
 + /*
 +  * If userspace is debugging guest then guest
 +  * can not access debug registers.
 +  */
 + if (vcpu-guest_debug)
 + break;
 +
 + debug_inst = true;
 + vcpu-arch.dbg_reg.dbcr2 = spr_val;
 + vcpu-arch.shadow_dbg_reg.dbcr2 = spr_val;
   break;
   
In what circumstances can the architected and shadow registers differ?
  
   As of now they are same. But I think that if we want to implement other
  features like Freeze Timer (FT) then they can be different.
  
  I don't think we can possibly implement Freeze Timer.
 
 May be, but in my opinion we should keep this open.

We're not talking about API here -- the implementation should be kept
simple if there's no imminent need for shadow registers.

   I am not sure what we should in that case ?
  
   As we are currently emulating a subset of debug events (IAC, DAC, IC,
   BT and TIE --- DBCR0 emulation) then we should expose status of those
   events in guest dbsr and rest should be cleared ?
  
  I'm not saying they need to be exposed to the guest, but I don't see where 
  you
  filter out bits like these.
 
 I am trying to get what all bits should be filtered out, all bits
 except IACx, DACx, IC, BT and TIE (same as event set filtering done
 when setting DBCR0) ? 
 
 i.e IDE, UDE, MRR, IRPT, RET, CIRPT, CRET should be filtered out?

Bits like IRPT and RET don't really matter, as you shouldn't see them
happen.  Likewise MRR if you're sure you've cleared it since boot.  But
IDE could be set any time an asynchronous exception happens.  I don't
think you should filter it out, but instead make sure that it doesn't
cause an exception to be delivered.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register

2014-07-30 Thread Scott Wood
On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, July 29, 2014 3:22 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest
  visible register
  
  On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
   This is not used and  even I do not remember why this was added in
   first place.
  
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   ---
arch/powerpc/kvm/booke.c | 2 --
1 file changed, 2 deletions(-)
  
   diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
   ab62109..a5ee42c 100644
   --- a/arch/powerpc/kvm/booke.c
   +++ b/arch/powerpc/kvm/booke.c
   @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct 
   kvm_vcpu
  *vcpu,
 kvm_guest_protect_msr(vcpu, MSR_DE, true);
 vcpu-guest_debug = dbg-control;
 vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
   - /* Set DBCR0_EDM in guest visible DBCR0 register. */
   - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM;
  
 if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
 vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC;
  
  This was intended to let the guest know that the host owns the debug 
  resources,
  by analogy to what a JTAG debugger would do.
  
  The Power ISA has this Virtualized Implementation Note:
  
  It is the responsibility of the hypervisor to ensure that
  DBCR0[EDM] is consistent with usage of DEP.
 
 Ok, That means that if MSRP_DEP is set then set DBCR0_EDM  and if MSRP_DEP is 
 clear then clear DBCR0_EDM, right?
 We need to implement above mentioned this.

We should probably clear EDM only when guest debug emulation is working
and enabled (i.e. not until at least patch 6/6).

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register

2014-07-30 Thread Scott Wood
On Wed, 2014-07-30 at 12:57 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Wednesday, July 30, 2014 11:18 PM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest
  visible register
  
  On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote:
  
-Original Message-
From: Wood Scott-B07421
Sent: Tuesday, July 29, 2014 3:22 AM
To: Bhushan Bharat-R65777
Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
Yoder Stuart-
B08248
Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM
in guest visible register
   
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
 This is not used and  even I do not remember why this was added in
 first place.

 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
  arch/powerpc/kvm/booke.c | 2 --
  1 file changed, 2 deletions(-)

 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index ab62109..a5ee42c 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -1804,8 +1804,6 @@ int
 kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu
*vcpu,
   kvm_guest_protect_msr(vcpu, MSR_DE, true);
   vcpu-guest_debug = dbg-control;
   vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
 - /* Set DBCR0_EDM in guest visible DBCR0 register. */
 - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM;

   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
   vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC;
   
This was intended to let the guest know that the host owns the debug
resources, by analogy to what a JTAG debugger would do.
   
The Power ISA has this Virtualized Implementation Note:
   
It is the responsibility of the hypervisor to ensure that
DBCR0[EDM] is consistent with usage of DEP.
  
   Ok, That means that if MSRP_DEP is set then set DBCR0_EDM  and if 
   MSRP_DEP is
  clear then clear DBCR0_EDM, right?
   We need to implement above mentioned this.
  
  We should probably clear EDM only when guest debug emulation is working and
  enabled (i.e. not until at least patch 6/6).
 
 But if EDM is set then guest debug emulation will not start/allowed.

I don't mean after the guest tries to write to the registers -- I mean
after the code has been added to KVM to allow it to work.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-30 Thread Scott Wood
On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, July 29, 2014 3:58 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
  Stuart-
  B08248
  Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and 
  exception
  
   Userspace might be interested in
  the raw value,
 
 With the current design, If userspace is interested then it will not
 get the DBSR.

Oh, because DBSR isn't currently implemented in sregs or one reg?

  But why userspace will be interested?

Do you expose all of the hardware's debugging features in your
high-level interface?

  plus it's a change from the current API semantics.
 
 Can you please let us know how ?

It looked like it was removing dbsr visibility and the requirement for
userspace to clear dbsr.  I guess the old way was that the value in
vcpu-arch.dbsr didn't matter until the next debug exception, when it
would be overwritten by the new SPRN_DBSR?

   + case SPRN_DBCR2:
   + /*
   +  * If userspace is debugging guest then guest
   +  * can not access debug registers.
   +  */
   + if (vcpu-guest_debug)
   + break;
   +
   + debug_inst = true;
   + vcpu-arch.dbg_reg.dbcr2 = spr_val;
   + vcpu-arch.shadow_dbg_reg.dbcr2 = spr_val;
 break;
  
  In what circumstances can the architected and shadow registers differ?
 
 As of now they are same. But I think that if we want to implement other 
 features like Freeze Timer (FT) then they can be different.

I don't think we can possibly implement Freeze Timer.
 
 case SPRN_DBSR:
   + /*
   +  * If userspace is debugging guest then guest
   +  * can not access debug registers.
   +  */
   + if (vcpu-guest_debug)
   + break;
   +
 vcpu-arch.dbsr = ~spr_val;
   + if (vcpu-arch.dbsr == 0)
   + kvmppc_core_dequeue_debug(vcpu);
 break;
  
  Not all DBSR bits cause an exception, e.g. IDE and MRR.
 
 I am not sure what we should in that case ?

 As we are currently emulating a subset of debug events (IAC, DAC, IC,
 BT and TIE --- DBCR0 emulation) then we should expose status of those
 events in guest dbsr and rest should be cleared ?

I'm not saying they need to be exposed to the guest, but I don't see
where you filter out bits like these.

   @@ -273,6 +397,10 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu 
   *vcpu, int
  sprn, ulong spr_val)
 emulated = EMULATE_FAIL;
 }
  
   + if (debug_inst) {
   + switch_booke_debug_regs(vcpu-arch.shadow_dbg_reg);
   + current-thread.debug = vcpu-arch.shadow_dbg_reg;
   + }
  
  Could you explain what's going on with regard to copying the registers
  into current-thread.debug?  Why is it done after loading the registers
  into the hardware (is there a race if we get preempted in the middle)?
 
 Yes, and this was something I was not clear when writing this code.
 Should we have preempt disable-enable around this.

Can they be reordered instead?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread Scott Wood
On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
 On 29.07.14 00:33, Scott Wood wrote:
  On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
  On 11.07.14 10:39, Bharat Bhushan wrote:
  This patch emulates debug registers and debug exception
  to support guest using debug resource. This enables running
  gdb/kgdb etc in guest.
 
  On BOOKE architecture we cannot share debug resources between QEMU and
  guest because:
When QEMU is using debug resources then debug exception must
be always enabled. To achieve this we set MSR_DE and also set
MSRP_DEP so guest cannot change MSR_DE.
 
When emulating debug resource for guest we want guest
to control MSR_DE (enable/disable debug interrupt on need).
 
So above mentioned two configuration cannot be supported
at the same time. So the result is that we cannot share
debug resources between QEMU and Guest on BOOKE architecture.
 
  In the current design QEMU gets priority over guest, this means that if
  QEMU is using debug resources then guest cannot use them and if guest is
  using debug resource then QEMU can overwrite them.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  Hi Alex,
 
  I thought of having some print in register emulation if QEMU
  is using debug resource, Also when QEMU overwrites guest written
  values but that looks excessive. If I uses some variable which
  get set when guest starts using debug registers and check in
  debug set ioctl then that look ugly. Looking for suggestions
  Whatever you do, have QEMU do the print, not the kernel.
  How would that be accomplished?  How would the kernel know to exit to
  QEMU, and how would the exit reason be conveyed?
 
 QEMU is the one forcefully enabling debug and overwriting guest debug 
 registers, so it also knows when it did overwrite valid ones.

QEMU knows when it overwrites the guest values, but it doesn't know if,
after enabling host debug, the guest tries to write to the debug
registers and it gets nopped.  If we keep the EDM setting, then we can
at least say the situation is no worse than with a JTAG.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register

2014-07-28 Thread Scott Wood
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
 This is not used and  even I do not remember why this was added
 in first place.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
  arch/powerpc/kvm/booke.c | 2 --
  1 file changed, 2 deletions(-)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index ab62109..a5ee42c 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
 *vcpu,
   kvm_guest_protect_msr(vcpu, MSR_DE, true);
   vcpu-guest_debug = dbg-control;
   vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
 - /* Set DBCR0_EDM in guest visible DBCR0 register. */
 - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM;
  
   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
   vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC;

This was intended to let the guest know that the host owns the debug
resources, by analogy to what a JTAG debugger would do.

The Power ISA has this Virtualized Implementation Note:

It is the responsibility of the hypervisor to ensure that
DBCR0[EDM] is consistent with usage of DEP.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug

2014-07-28 Thread Scott Wood
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
 When userspace (QEMU) is using the debug resource to debug guest
 then we want MSR_DE to be always set. This patch adds missing
 MSR_DE setting in rfci instruction.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
  arch/powerpc/kvm/booke_emulate.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/kvm/booke_emulate.c 
 b/arch/powerpc/kvm/booke_emulate.c
 index 27a4b28..80c51a2 100644
 --- a/arch/powerpc/kvm/booke_emulate.c
 +++ b/arch/powerpc/kvm/booke_emulate.c
 @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
  static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
  {
   vcpu-arch.pc = vcpu-arch.csrr0;
 - kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
 + /* Force MSR_DE when guest does not own debug facilities */
 + if (vcpu-guest_debug)
 + kvmppc_set_msr(vcpu, vcpu-arch.csrr1 | MSR_DE);
 + else
 + kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
  }
  
  int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,

It looks like this is already handled by kvmppc_vcpu_sync_debug(), which
is called by kvmppc_set_msr().

Plus, it should only be done for HV mode.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE

2014-07-28 Thread Scott Wood
On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote:
 When userspace is debugging guest then MSR_DE is always set and
 MSRP_DEP is set so that guest cannot change MSR_DE.
 Guest debug resources are not yet emulated, So there seems no reason
 we should stop guest controlling MSR_DE.
 Also a followup patch will enable debug emulation and that requires
 guest to control MSR_DE.

Why does it matter whether we emulate debug resources?  We still don't
want the guest to be able to clear MSR[DE] and thus break host debug.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-28 Thread Scott Wood
On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote:
 This patch emulates debug registers and debug exception
 to support guest using debug resource. This enables running
 gdb/kgdb etc in guest.
 
 On BOOKE architecture we cannot share debug resources between QEMU and
 guest because:
 When QEMU is using debug resources then debug exception must
 be always enabled. To achieve this we set MSR_DE and also set
 MSRP_DEP so guest cannot change MSR_DE.

 When emulating debug resource for guest we want guest
 to control MSR_DE (enable/disable debug interrupt on need).
 
 So above mentioned two configuration cannot be supported
 at the same time. So the result is that we cannot share
 debug resources between QEMU and Guest on BOOKE architecture.
 
 In the current design QEMU gets priority over guest, this means that if
 QEMU is using debug resources then guest cannot use them and if guest is
 using debug resource then QEMU can overwrite them.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 Hi Alex,
 
 I thought of having some print in register emulation if QEMU
 is using debug resource, Also when QEMU overwrites guest written
 values but that looks excessive. If I uses some variable which
 get set when guest starts using debug registers and check in
 debug set ioctl then that look ugly. Looking for suggestions
 
  arch/powerpc/include/asm/kvm_ppc.h |   3 +
  arch/powerpc/kvm/booke.c   |  27 +++
  arch/powerpc/kvm/booke_emulate.c   | 157 
 +
  3 files changed, 187 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index e2fd5a1..f3f7611 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -173,6 +173,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, 
 u32 *server,
  extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
  extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
  
 +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu);
 +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu);
 +
  union kvmppc_one_reg {
   u32 wval;
   u64 dval;
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index fadfe76..c2471ed 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -264,6 +264,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu 
 *vcpu)
   clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions);
  }
  
 +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu)
 +{
 + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG);
 +}
 +
 +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu)
 +{
 + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions);
 +}
 +
  static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 
 srr1)
  {
  #ifdef CONFIG_KVM_BOOKE_HV
 @@ -783,6 +793,23 @@ static int kvmppc_handle_debug(struct kvm_run *run, 
 struct kvm_vcpu *vcpu)
   struct debug_reg *dbg_reg = (vcpu-arch.shadow_dbg_reg);
   u32 dbsr = vcpu-arch.dbsr;
  
 + if (vcpu-guest_debug == 0) {
 + /* Debug resources belong to Guest */
 + if (dbsr  (vcpu-arch.shared-msr  MSR_DE))
 + kvmppc_core_queue_debug(vcpu);

Should also check for DBCR0[IDM].

 +
 + /* Inject a program interrupt if trap debug is not allowed */
 + if ((dbsr  DBSR_TIE)  !(vcpu-arch.shared-msr  MSR_DE))
 + kvmppc_core_queue_program(vcpu, ESR_PTR);
 +
 + return RESUME_GUEST;
 + }
 +
 + /*
 +  * Prepare for userspace exit as debug resources
 +  * are owned by userspace.
 +  */
 + vcpu-arch.dbsr = 0;
   run-debug.arch.status = 0;
   run-debug.arch.address = vcpu-arch.pc;

Why are you clearing vcpu-arch.dbsr?  Userspace might be interested in
the raw value, plus it's a change from the current API semantics.

Where is this run-debug.arch.foo stuff and KVM_EXIT_DEBUG documented?


 diff --git a/arch/powerpc/kvm/booke_emulate.c 
 b/arch/powerpc/kvm/booke_emulate.c
 index 2a20194..3d143fe 100644
 --- a/arch/powerpc/kvm/booke_emulate.c
 +++ b/arch/powerpc/kvm/booke_emulate.c
 @@ -139,6 +139,7 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
  int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong 
 spr_val)
  {
   int emulated = EMULATE_DONE;
 + bool debug_inst = false;
  
   switch (sprn) {
   case SPRN_DEAR:
 @@ -153,14 +154,137 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, 
 int sprn, ulong spr_val)
   case SPRN_CSRR1:
   vcpu-arch.csrr1 = spr_val;
   break;
 + case SPRN_DSRR0:
 + vcpu-arch.dsrr0 = spr_val;
 + break;
 + case SPRN_DSRR1:
 + vcpu-arch.dsrr1 = spr_val;
 + break;
 + case SPRN_IAC1:
 + /*
 +  * If userspace is debugging guest then 

Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-28 Thread Scott Wood
On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
 On 11.07.14 10:39, Bharat Bhushan wrote:
  This patch emulates debug registers and debug exception
  to support guest using debug resource. This enables running
  gdb/kgdb etc in guest.
 
  On BOOKE architecture we cannot share debug resources between QEMU and
  guest because:
   When QEMU is using debug resources then debug exception must
   be always enabled. To achieve this we set MSR_DE and also set
   MSRP_DEP so guest cannot change MSR_DE.
 
   When emulating debug resource for guest we want guest
   to control MSR_DE (enable/disable debug interrupt on need).
 
   So above mentioned two configuration cannot be supported
   at the same time. So the result is that we cannot share
   debug resources between QEMU and Guest on BOOKE architecture.
 
  In the current design QEMU gets priority over guest, this means that if
  QEMU is using debug resources then guest cannot use them and if guest is
  using debug resource then QEMU can overwrite them.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  Hi Alex,
 
  I thought of having some print in register emulation if QEMU
  is using debug resource, Also when QEMU overwrites guest written
  values but that looks excessive. If I uses some variable which
  get set when guest starts using debug registers and check in
  debug set ioctl then that look ugly. Looking for suggestions
 
 Whatever you do, have QEMU do the print, not the kernel.

How would that be accomplished?  How would the kernel know to exit to
QEMU, and how would the exit reason be conveyed?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-28 Thread Scott Wood
On Mon, 2014-07-28 at 03:54 -0500, Caraman Mihai Claudiu-B02008 wrote:
  -Original Message-
  From: Wood Scott-B07421
  Sent: Saturday, July 26, 2014 3:11 AM
  To: Caraman Mihai Claudiu-B02008
  Cc: Alexander Graf; kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
  linuxppc-...@lists.ozlabs.org
  Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
  SPE/FP/AltiVec int numbers
  
  On Thu, 2014-07-24 at 04:16 -0500, Caraman Mihai Claudiu-B02008 wrote:
   Scott, Alex's request to define SPE handlers only for e500v2 implies
  changes
   in 32-bit FSL kernel to have exclusive configurations for e200/e500v2
  and
   e500mc/e5500. We would probably need something like this, what's your
  take on it?
  
  That is already a compile-time decision.
 
 Yes, but is not fully implemented.

We might be missing some kconfig language to prohibit e500v2 boards from
being enabled in an e500mc kernel, but if you try to do so it won't work
(except for obsolete hacks like running QEMU's mpc8544ds platform with
-cpu e500mc).

   diff --git a/arch/powerpc/kernel/head_fsl_booke.S
  b/arch/powerpc/kernel/head_fsl_booke.S
   index b497188..9d41015 100644
   --- a/arch/powerpc/kernel/head_fsl_booke.S
   +++ b/arch/powerpc/kernel/head_fsl_booke.S
   @@ -613,6 +613,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
   mfspr   r10, SPRN_SPRG_RSCRATCH0
   b   InstructionStorage
  
   +/* Define SPE handlers only for e500v2 and e200 */
   +#ifndef CONFIG_PPC_E500MC
#ifdef CONFIG_SPE
   /* SPE Unavailable */
   START_EXCEPTION(SPEUnavailable)
   @@ -626,7 +628,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
   EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \
 unknown_exception, EXC_XFER_EE)
#endif /* CONFIG_SPE */
   +#endif
  
  This is redundant:
  
  config SPE
  bool SPE Support
  depends on E200 || (E500  !PPC_E500MC)
  default y
 
 I see what you mean, but it's not redundant.

OK, I didn't realize there was an #else that wasn't included in the
context.  It would have been nice if the comment at the end were
!CONFIG_SPE rather than CONFIG_SPE.

 Alex was looking to remove SPE
 handlers for e500mc+ and the proposal handled !SPE case. In the new
 light I find this variant more readable:
 
 +/* Define SPE handlers for e200 and e500v2 */
  #ifdef CONFIG_SPE
 /* SPE Unavailable */
 START_EXCEPTION(SPEUnavailable)
 @@ -622,11 +623,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
 b   fast_exception_return
  1: addir3,r1,STACK_FRAME_OVERHEAD
 EXC_XFER_EE_LITE(0x2010, KernelSPE)
 -#else
 +#elif defined(CONFIG_E200) || \
 +  (defined(CONFIG_E500)  !defined(CONFIG_PPC_E500MC))
 EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \
   unknown_exception, EXC_XFER_EE)
  #endif /* CONFIG_SPE */

Yes, or better define a CONFIG_SPE_POSSIBLE so that the list only has to
exist in one place, and the intent is clearer.
 
   diff --git a/arch/powerpc/kernel/cputable.c
  b/arch/powerpc/kernel/cputable.c
   index c1faade..3ab65c2 100644
   --- a/arch/powerpc/kernel/cputable.c
   +++ b/arch/powerpc/kernel/cputable.c
   @@ -2030,6 +2030,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
#endif /* CONFIG_PPC32 */
#ifdef CONFIG_E500
#ifdef CONFIG_PPC32
   +#ifndef CONFIG_PPC_E500MC
   {   /* e500 */
   .pvr_mask   = 0x,
   .pvr_value  = 0x8020,
   @@ -2069,6 +2070,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
   .machine_check  = machine_check_e500,
   .platform   = ppc8548,
   },
   +#endif /* !CONFIG_PPC_E500MC */
   {   /* e500mc */
   .pvr_mask   = 0x,
   .pvr_value  = 0x8023,
  
  
  This looks a bit strange -- e500v2 gets excluded if PPC_E500MC, but
  e500mc gets included in !PPC_E500MC?
 
 Right, my main purpose was to get rid of __setup_e500_ivors on PPC_E500MC
 which refers SPEUnavailable. I will add an #else to exclude e500mc.
 
 The generic E500 PPC default cpu advertises PPC_FEATURE_HAS_SPE_COMP.
 Do we want to excluded it if PPC_E500MC? Is this cpu useful without 
 cpu_setup()
 and if so why don't we also have a default for 64-bit?

I don't think that default will do anything useful.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-25 Thread Scott Wood
On Thu, 2014-07-24 at 04:16 -0500, Caraman Mihai Claudiu-B02008 wrote:
 Scott, Alex's request to define SPE handlers only for e500v2 implies changes
 in 32-bit FSL kernel to have exclusive configurations for e200/e500v2 and
 e500mc/e5500. We would probably need something like this, what's your take on 
 it?

That is already a compile-time decision.

 diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
 b/arch/powerpc/kernel/head_fsl_booke.S
 index b497188..9d41015 100644
 --- a/arch/powerpc/kernel/head_fsl_booke.S
 +++ b/arch/powerpc/kernel/head_fsl_booke.S
 @@ -613,6 +613,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
 mfspr   r10, SPRN_SPRG_RSCRATCH0
 b   InstructionStorage
  
 +/* Define SPE handlers only for e500v2 and e200 */
 +#ifndef CONFIG_PPC_E500MC
  #ifdef CONFIG_SPE
 /* SPE Unavailable */
 START_EXCEPTION(SPEUnavailable)
 @@ -626,7 +628,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
 EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \
   unknown_exception, EXC_XFER_EE)
  #endif /* CONFIG_SPE */
 +#endif

This is redundant:

config SPE
bool SPE Support
depends on E200 || (E500  !PPC_E500MC)
default y 

 diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
 index c1faade..3ab65c2 100644
 --- a/arch/powerpc/kernel/cputable.c
 +++ b/arch/powerpc/kernel/cputable.c
 @@ -2030,6 +2030,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
  #endif /* CONFIG_PPC32 */
  #ifdef CONFIG_E500
  #ifdef CONFIG_PPC32
 +#ifndef CONFIG_PPC_E500MC
 {   /* e500 */
 .pvr_mask   = 0x,
 .pvr_value  = 0x8020,
 @@ -2069,6 +2070,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 .machine_check  = machine_check_e500,
 .platform   = ppc8548,
 },
 +#endif /* !CONFIG_PPC_E500MC */
 {   /* e500mc */
 .pvr_mask   = 0x,
 .pvr_value  = 0x8023,
 

This looks a bit strange -- e500v2 gets excluded if PPC_E500MC, but
e500mc gets included in !PPC_E500MC?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] powerpc/e6500: Work around erratum A-008139

2014-07-25 Thread Scott Wood
Erratum A-008139 can cause duplicate TLB entries if an indirect
entry is overwritten using tlbwe while the other thread is using it to
do a lookup.  Work around this by using tlbilx to invalidate prior
to overwriting.

To avoid the need to save another register to hold MAS1 during the
workaround code, TID clearing has been moved from tlb_miss_kernel_e6500
until after the SMT section.

Signed-off-by: Scott Wood scottw...@freescale.com
---
 arch/powerpc/mm/tlb_low_64e.S | 68 +++
 1 file changed, 56 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index 57c4d66..89bf95b 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -299,7 +299,9 @@ itlb_miss_fault_bolted:
  * r10 = crap (free to use)
  */
 tlb_miss_common_e6500:
-BEGIN_FTR_SECTION
+   crmove  cr2*4+2,cr0*4+2 /* cr2.eq != 0 if kernel address */
+
+BEGIN_FTR_SECTION  /* CPU_FTR_SMT */
/*
 * Search if we already have an indirect entry for that virtual
 * address, and if we do, bail out.
@@ -324,17 +326,62 @@ BEGIN_FTR_SECTION
b   1b
.previous
 
+   /*
+* Erratum A-008139 says that we can't use tlbwe to change
+* an indirect entry in any way (including replacing or
+* invalidating) if the other thread could be in the process
+* of a lookup.  The workaround is to invalidate the entry
+* with tlbilx before overwriting.
+*/
+
+   lbz r15,TCD_ESEL_NEXT(r11)
+   rlwinm  r10,r15,16,0xff
+   orisr10,r10,MAS0_TLBSEL(1)@h
+   mtspr   SPRN_MAS0,r10
+   isync
+   tlbre
mfspr   r15,SPRN_MAS1
-   mfspr   r10,SPRN_MAS2
+   andis.  r15,r15,MAS1_VALID@h
+   beq 5f
+
+BEGIN_FTR_SECTION_NESTED(532)
+   mfspr   r10,SPRN_MAS8
+   rlwinm  r10,r10,0,0x8fff  /* tgs,tlpid - sgs,slpid */
+   mtspr   SPRN_MAS5,r10
+END_FTR_SECTION_NESTED(CPU_FTR_EMB_HV,CPU_FTR_EMB_HV,532)
 
-   tlbsx   0,r16
-   mtspr   SPRN_MAS2,r10
mfspr   r10,SPRN_MAS1
-   mtspr   SPRN_MAS1,r15
+   rlwinm  r15,r10,0,0x3fff  /* tid - spid */
+   rlwimi  r15,r10,20,0x0003 /* ind,ts - sind,sas */
+   mfspr   r10,SPRN_MAS6
+   mtspr   SPRN_MAS6,r15
+
+   mfspr   r15,SPRN_MAS2
+   isync
+   tlbilxva 0,r15
+   isync
+
+   mtspr   SPRN_MAS6,r10
 
-   andis.  r10,r10,MAS1_VALID@h
+5:
+BEGIN_FTR_SECTION_NESTED(532)
+   li  r10,0
+   mtspr   SPRN_MAS8,r10
+   mtspr   SPRN_MAS5,r10
+END_FTR_SECTION_NESTED(CPU_FTR_EMB_HV,CPU_FTR_EMB_HV,532)
+
+   tlbsx   0,r16
+   mfspr   r10,SPRN_MAS1
+   andis.  r15,r10,MAS1_VALID@h
bne tlb_miss_done_e6500
-END_FTR_SECTION_IFSET(CPU_FTR_SMT)
+FTR_SECTION_ELSE
+   mfspr   r10,SPRN_MAS1
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_SMT)
+
+   orisr10,r10,MAS1_VALID@h
+   beq cr2,4f
+   rlwinm  r10,r10,0,16,1  /* Clear TID */
+4: mtspr   SPRN_MAS1,r10
 
/* Now, we need to walk the page tables. First check if we are in
 * range.
@@ -410,12 +457,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_SMT)
rfi
 
 tlb_miss_kernel_e6500:
-   mfspr   r10,SPRN_MAS1
ld  r14,PACA_KERNELPGD(r13)
-   cmpldi  cr0,r15,8   /* Check for vmalloc region */
-   rlwinm  r10,r10,0,16,1  /* Clear TID */
-   mtspr   SPRN_MAS1,r10
-   beq+tlb_miss_common_e6500
+   cmpldi  cr1,r15,8   /* Check for vmalloc region */
+   beq+cr1,tlb_miss_common_e6500
 
 tlb_miss_fault_e6500:
tlb_unlock_e6500
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest

2014-07-17 Thread Scott Wood
On Fri, 2014-07-18 at 02:33 +0200, Alexander Graf wrote:
 On 18.07.14 02:28, Scott Wood wrote:
  On Thu, 2014-07-17 at 18:29 +0200, Alexander Graf wrote:
  On 17.07.14 18:27, Alexander Graf wrote:
  On 17.07.14 18:24, bharat.bhus...@freescale.com wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, July 17, 2014 9:41 PM
  To: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org
  Cc: k...@vger.kernel.org; Wood Scott-B07421; Yoder Stuart-B08248
  Subject: Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest
 
 
  On 16.07.14 08:02, Bharat Bhushan wrote:
  SPRG3 is guest accessible and SPRG3 can be clobbered by host or
  another guest, So this need to be restored when loading guest state.
  SPRG3 is not guest writeable.  We should be doing this so that guest
  reads of SPRG3 through the alternative read-only SPR work, not because
  SPRG3 can be clobbered by host or another guest.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/kvm/booke_interrupts.S | 2 ++
  1 file changed, 2 insertions(+)
 
  diff --git a/arch/powerpc/kvm/booke_interrupts.S
  b/arch/powerpc/kvm/booke_interrupts.S
  index 2c6deb5ef..0d3403f 100644
  --- a/arch/powerpc/kvm/booke_interrupts.S
  +++ b/arch/powerpc/kvm/booke_interrupts.S
  @@ -459,6 +459,8 @@ lightweight_exit:
   * written directly to the shared area, so we
   * need to reload them here with the guest's values.
   */
  +PPC_LD(r3, VCPU_SHARED_SPRG3, r5)
  +mtsprSPRN_SPRG3, r3
  We also need to restore it when resuming the host, no?
  I do not think host expect some meaningful value when returning from
  guest, same true for SPRG4-7.
  So there seems no reason to save host values and restore them.
  Linux no longer uses SPRG4-7 for itself.  That is not true of SPRG3, as
  Alex points out.
 
  Hmm - arch/powerpc/include/asm/reg.h says:
 
* All 32-bit:
*  - SPRG3 current thread_info pointer
*(virtual on BookE, physical on others)
 
  but I can indeed find no trace of usage anywhere. This at least needs
  to go into the patch description.
  Bah - it obviously is used. It's SPRN_SPRG_THREAD. And it's so
  incredibly important that I have no idea how we could possibly run
  without switching the host value back in very early. And even then our
  interrupt handlers wouldn't work anymore.
 
  This is more complicated :).
  To make this work we need to avoid SPRG3 as well, or at least avoid
  using it for something needed prior to DO_KVM.
 
  We also need to update the documentation in reg.h to reflect the fact
  that we don't use SPRG4-7 anymore on e500.
 
 I would personally prefer if we claim SPRG3R as unsupported on e500v2 
 until we find someone who actually uses it. There's a good chance we'd 
 start jumping through a lot of hoops and reduce overall performance for 
 no real-world gain today.

The same problem applies to e500mc.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering guest

2014-07-17 Thread Scott Wood
On Fri, 2014-07-18 at 02:37 +0200, Alexander Graf wrote:
 On 18.07.14 02:36, Scott Wood wrote:
  On Fri, 2014-07-18 at 02:33 +0200, Alexander Graf wrote:
  On 18.07.14 02:28, Scott Wood wrote:
  On Thu, 2014-07-17 at 18:29 +0200, Alexander Graf wrote:
  On 17.07.14 18:27, Alexander Graf wrote:
  On 17.07.14 18:24, bharat.bhus...@freescale.com wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, July 17, 2014 9:41 PM
  To: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org
  Cc: k...@vger.kernel.org; Wood Scott-B07421; Yoder Stuart-B08248
  Subject: Re: [PATCH] kvm: ppc: booke: Restore SPRG3 when entering 
  guest
 
 
  On 16.07.14 08:02, Bharat Bhushan wrote:
  SPRG3 is guest accessible and SPRG3 can be clobbered by host or
  another guest, So this need to be restored when loading guest state.
  SPRG3 is not guest writeable.  We should be doing this so that guest
  reads of SPRG3 through the alternative read-only SPR work, not because
  SPRG3 can be clobbered by host or another guest.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/kvm/booke_interrupts.S | 2 ++
   1 file changed, 2 insertions(+)
 
  diff --git a/arch/powerpc/kvm/booke_interrupts.S
  b/arch/powerpc/kvm/booke_interrupts.S
  index 2c6deb5ef..0d3403f 100644
  --- a/arch/powerpc/kvm/booke_interrupts.S
  +++ b/arch/powerpc/kvm/booke_interrupts.S
  @@ -459,6 +459,8 @@ lightweight_exit:
* written directly to the shared area, so we
* need to reload them here with the guest's values.
*/
  +PPC_LD(r3, VCPU_SHARED_SPRG3, r5)
  +mtsprSPRN_SPRG3, r3
  We also need to restore it when resuming the host, no?
  I do not think host expect some meaningful value when returning from
  guest, same true for SPRG4-7.
  So there seems no reason to save host values and restore them.
  Linux no longer uses SPRG4-7 for itself.  That is not true of SPRG3, as
  Alex points out.
 
  Hmm - arch/powerpc/include/asm/reg.h says:
 
 * All 32-bit:
 *  - SPRG3 current thread_info pointer
 *(virtual on BookE, physical on others)
 
  but I can indeed find no trace of usage anywhere. This at least needs
  to go into the patch description.
  Bah - it obviously is used. It's SPRN_SPRG_THREAD. And it's so
  incredibly important that I have no idea how we could possibly run
  without switching the host value back in very early. And even then our
  interrupt handlers wouldn't work anymore.
 
  This is more complicated :).
  To make this work we need to avoid SPRG3 as well, or at least avoid
  using it for something needed prior to DO_KVM.
 
  We also need to update the documentation in reg.h to reflect the fact
  that we don't use SPRG4-7 anymore on e500.
  I would personally prefer if we claim SPRG3R as unsupported on e500v2
  until we find someone who actually uses it. There's a good chance we'd
  start jumping through a lot of hoops and reduce overall performance for
  no real-world gain today.
  The same problem applies to e500mc.
 
 There we have SPRN_GSPRG3, no?

Oh, right.

Since it's only a problem for PR-mode, it can be fixed without needing
to avoid SPRG3 entirely, since PR-mode doesn't use DO_KVM.  We'd only
need to avoid using SPRG_THREAD in __KVM_HANDLER (i.e. revert commit
ffe129ecd79779221fdb03305049ec8b5a8beb0f).

And if we decide it's not worthwhile and don't revert that commit, we
should at least remove the comment that Under KVM, the host SPRG1 is
used to point to the current VCPU data structure...

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] kvm: ppc: bookehv: Added wrapper macros for shadow registers

2014-07-17 Thread Scott Wood
On Thu, 2014-07-17 at 17:01 +0530, Bharat Bhushan wrote:
 There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on
 BOOKE-HV and these shadow registers are guest accessible.
 So these shadow registers needs to be updated on BOOKE-HV.
 This patch adds new macro for get/set helper of shadow register .
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 v1-v2
  - Fix compilation for book3s (separate macro etc)
 
  arch/powerpc/include/asm/kvm_ppc.h | 44 
 +++---
  1 file changed, 36 insertions(+), 8 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index f3f7611..7646994 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -475,8 +475,20 @@ static inline bool kvmppc_shared_big_endian(struct 
 kvm_vcpu *vcpu)
  #endif
  }
  
 +#define SPRNG_WRAPPER_GET(reg, e500hv_spr)   \
 +static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)  \
 +{\
 + return mfspr(e500hv_spr);   \
 +}\
 +
 +#define SPRNG_WRAPPER_SET(reg, e500hv_spr)   \
 +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)
 \
 +{\
 + mtspr(e500hv_spr, val); \
 +}\

Why e500hv rather than bookehv?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/4] KVM: PPC: Book3E: Handle LRAT error exception

2014-07-07 Thread Scott Wood
On Fri, 2014-07-04 at 10:15 +0200, Alexander Graf wrote:
 On 03.07.14 16:45, Mihai Caraman wrote:
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index a192975..ab1077f 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -1286,6 +1286,46 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  break;
  }

  +#ifdef CONFIG_KVM_BOOKE_HV
  +   case BOOKE_INTERRUPT_LRAT_ERROR:
  +   {
  +   gfn_t gfn;
  +
  +   /*
  +* Guest TLB management instructions (EPCR.DGTMI == 0) is not
  +* supported for now
  +*/
  +   if (!(vcpu-arch.fault_esr  ESR_PT)) {
  +   WARN(1, %s: Guest TLB management instructions not 
  supported!\n, __func__);
 
 Wouldn't this allow a guest to flood the host's kernel log?

It shouldn't be possible for this to happen, since the host will never
set EPCR[DGTMI] -- but yes, it should be WARN_ONCE or ratelimited.

  +{
  +   int this, next;
  +
  +   this = local_paca-tcd.lrat_next;
  +   next = (this + 1) % local_paca-tcd.lrat_max;
 
 Can we assume that lrat_max is always a power of 2? IIRC modulo 
 functions with variables can be quite expensive. So if we can instead do
 
next = (this + 1)  local_paca-tcd.lrat_mask;
 
 we should be faster and not rely on division helpers.

Architecturally we can't assume that, though it's true on the only
existing implementation.

Why not do something similar to what is done for tlb1:

unsigned int sesel = vcpu_e500-host_tlb1_nv++;

if (unlikely(vcpu_e500-host_tlb1_nv = tlb1_max_shadow_size()))
vcpu_e500-host_tlb1_nv = 0;

...and while we're at it, use local_paca-tcd for tlb1 as well (except
on 32-bit).

Also, please use get_paca() rather than local_paca so that the
preemption-disabled check is retained.

  +void write_host_lrate(int tsize, gfn_t gfn, unsigned long pfn, uint32_t 
  lpid,
  + int valid, int lrat_entry)
  +{
  +   struct kvm_book3e_206_tlb_entry stlbe;
  +   int esel = lrat_entry;
  +   unsigned long flags;
  +
  +   stlbe.mas1 = (valid ? MAS1_VALID : 0) | MAS1_TSIZE(tsize);
  +   stlbe.mas2 = ((u64)gfn  PAGE_SHIFT);
  +   stlbe.mas7_3 = ((u64)pfn  PAGE_SHIFT);
  +   stlbe.mas8 = MAS8_TGS | lpid;
  +
  +   local_irq_save(flags);
  +   /* book3e_tlb_lock(); */
 
 Hm?

Indeed.

  +
  +   if (esel == -1)
  +   esel = lrat_next();
  +   __write_host_tlbe(stlbe, MAS0_ATSEL | MAS0_ESEL(esel));

Where do you call this function with lrat_entry != -1?  Why rename it to
esel at function entry?

  +   down_read(current-mm-mmap_sem);
  +   vma = find_vma(current-mm, hva);
  +   if (vma  (hva = vma-vm_start)) {
  +   psize = vma_kernel_pagesize(vma);
  +   } else {
  +   pr_err_ratelimited(%s: couldn't find virtual memory address 
  for gfn %lx!\n, __func__, (long)gfn);

While output strings should not be linewrapped, the arguments that come
after the long string should be.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/4] KVM: PPC: e500: TLB emulation for IND entries

2014-07-07 Thread Scott Wood
On Thu, 2014-07-03 at 17:45 +0300, Mihai Caraman wrote:
 Handle indirect entries (IND) in TLB emulation code. Translation size of IND
 entries differ from the size of referred Page Tables (Linux guests now use IND
 of 2MB for 4KB PTs) and this require careful tweak of the existing logic.
 
 TLB search emulation requires additional search in HW TLB0 (since these 
 entries
 are directly added by HTW) and found entries shoud be presented to the guest 
 with
 RPN changed from PFN to GFN. There might be more GFNs pointing to the same 
 PFN so
 the only way to get the corresponding GFN is to search it in guest's PTE. If 
 IND
 entry for the corresponding PT is not available just invalidate guest's ea and
 report a tlbsx miss. This patch only implements the invalidation and let a 
 TODO
 note for searching HW TLB0.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/include/asm/mmu-book3e.h |  2 +
  arch/powerpc/kvm/e500.h   | 81 
 ---
  arch/powerpc/kvm/e500_mmu.c   | 78 +++--
  arch/powerpc/kvm/e500_mmu_host.c  | 31 --
  arch/powerpc/kvm/e500mc.c | 53 +--
  5 files changed, 211 insertions(+), 34 deletions(-)

Please look at erratum A-008139.  A patch to work around it for the main
Linux tablewalk code is forthcoming.  You need to make sure to never
overwrite an indirect entry with tlbwe -- always use tlbilx.

 @@ -286,6 +336,22 @@ void kvmppc_e500_tlbil_one(struct kvmppc_vcpu_e500 
 *vcpu_e500,
  void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500 *vcpu_e500);
  
  #ifdef CONFIG_KVM_BOOKE_HV
 +void inval_tlb_on_host(struct kvm_vcpu *vcpu, int type, int pid);
 +
 +void inval_ea_on_host(struct kvm_vcpu *vcpu, gva_t ea, int pid, int sas,
 +   int sind);
 +#else
 +/* TLB is fully virtualized */
 +static inline void inval_tlb_on_host(struct kvm_vcpu *vcpu,
 +  int type, int pid)
 +{}
 +
 +static inline void inval_ea_on_host(struct kvm_vcpu *vcpu, gva_t ea, int pid,
 + int sas, int sind)
 +{}
 +#endif

Document exactly what these do, and explain why it's conceptually a
separate API from kvmppc_e500_tlbil_all/one(), and why TLB is fully
virtualized explains why we never need to do this on non-HV.

 @@ -290,21 +298,32 @@ static void tlbilx_all(struct kvmppc_vcpu_e500 
 *vcpu_e500, int tlbsel,
   kvmppc_e500_gtlbe_invalidate(vcpu_e500, tlbsel, esel);
   }
   }
 +
 + /* Invalidate enties added by HTW */
 + if (has_feature(vcpu_e500-vcpu, VCPU_FTR_MMU_V2)  (!sind))

Unnecessary parens.

 @@ -368,6 +388,23 @@ int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t 
 ea)
   } else {
   int victim;
  
 + if (has_feature(vcpu, VCPU_FTR_MMU_V2) 
 + get_cur_sind(vcpu) != 1) {
 + /*

Never align the continuation line of an if or loop with the indentation
of the if/loop body.

get_cur_sind(vcpu) == 0 would be more natural, and probably slightly
faster.

 +  * TLB0 entries are not cached in KVM being written
 +  * directly by HTW. TLB0 entry found in HW TLB0 needs
 +  * to be presented to the guest with RPN changed from
 +  * PFN to GFN. There might be more GFNs pointing to the
 +  * same PFN so the only way to get the corresponding GFN
 +  * is to search it in guest's PTE. If IND entry for the
 +  * corresponding PT is not available just invalidate
 +  * guest's ea and report a tlbsx miss.
 +  *
 +  * TODO: search ea in HW TLB0
 +  */
 + inval_ea_on_host(vcpu, ea, pid, as, 0);

What if the guest is in a loop where it does an access followed by a
tlbsx, looping back if the tlbsx reports no entry (e.g. an exception
handler that wants to emulate retrying the faulting instruction on a
tlbsx miss)?  The architecture allows hypervisors to invalidate at
arbitrary times, but we should avoid doing this in ways that can block
forward progress.

How likely is it really that we have multiple GFNs per PFN?  How bad is
it really if we return an arbitrary matching GFN in such a case?

 @@ -516,12 +527,17 @@ static inline int kvmppc_e500_shadow_map(struct 
 kvmppc_vcpu_e500 *vcpu_e500,
  
   tsize = (gtlbe-mas1  MAS1_TSIZE_MASK) 
   MAS1_TSIZE_SHIFT;
 + if (ind)
 + tsize -= BOOK3E_PAGESZ_4K + 7;
  
   /*
* e500 doesn't implement the lowest tsize bit,
* or 1K pages.
*/
 - tsize = max(BOOK3E_PAGESZ_4K, tsize  ~1);
 + if 

Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Scott Wood
On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, July 03, 2014 3:21 PM
  To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org
  Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org
  Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
  SPE/FP/AltiVec int numbers
  
  
  On 30.06.14 17:34, Mihai Caraman wrote:
   Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
   which share the same interrupt numbers.
  
   Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
   ---
   v2:
 - remove outdated definitions
  
 arch/powerpc/include/asm/kvm_asm.h|  8 
 arch/powerpc/kvm/booke.c  | 17 +
 arch/powerpc/kvm/booke.h  |  4 ++--
 arch/powerpc/kvm/booke_interrupts.S   |  9 +
 arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
 arch/powerpc/kvm/e500.c   | 10 ++
 arch/powerpc/kvm/e500_emulate.c   | 10 ++
 7 files changed, 30 insertions(+), 32 deletions(-)
  
   diff --git a/arch/powerpc/include/asm/kvm_asm.h
  b/arch/powerpc/include/asm/kvm_asm.h
   index 9601741..c94fd33 100644
   --- a/arch/powerpc/include/asm/kvm_asm.h
   +++ b/arch/powerpc/include/asm/kvm_asm.h
   @@ -56,14 +56,6 @@
 /* E500 */
 #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
   -/*
   - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
  defines
   - */
   -#define BOOKE_INTERRUPT_SPE_UNAVAIL
  BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
   -#define BOOKE_INTERRUPT_SPE_FP_DATA
  BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
   -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
  BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
   -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
   - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
  
  I think I'd prefer to keep them separate.
 
 What is the reason from changing your mind from ver 1? Do you want to have
 different defines with same values (we specifically mapped them to the
 hardware interrupt numbers). We already upstreamed the necessary changes
 in the kernel. Scott, please share your opinion here.

I don't like hiding the fact that they're the same number, which could
lead to wrong code in the absence of ifdefs that strictly mutually
exclude SPE and Altivec code -- there was an instance of this with
MSR_VEC versus MSR_SPE in a previous patchset.
 
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 #define BOOKE_INTERRUPT_DOORBELL 36
   diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
   index ab62109..3c86d9b 100644
   --- a/arch/powerpc/kvm/booke.c
   +++ b/arch/powerpc/kvm/booke.c
   @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct
  kvm_vcpu *vcpu,
 case BOOKE_IRQPRIO_ITLB_MISS:
 case BOOKE_IRQPRIO_SYSCALL:
 case BOOKE_IRQPRIO_FP_UNAVAIL:
   - case BOOKE_IRQPRIO_SPE_UNAVAIL:
   - case BOOKE_IRQPRIO_SPE_FP_DATA:
   + case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
   + case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
  
  #ifdef CONFIG_KVM_E500V2
 case ...SPE:
  #else
 case ..ALTIVEC:
  #endif

I don't think that's an improvement.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Scott Wood
On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote:
 On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:
   -Original Message-
   From: Alexander Graf [mailto:ag...@suse.de]
   Sent: Thursday, July 03, 2014 3:21 PM
   To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org
   Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org
   Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
   SPE/FP/AltiVec int numbers
   
   
   On 30.06.14 17:34, Mihai Caraman wrote:
Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
which share the same interrupt numbers.
   
Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v2:
  - remove outdated definitions
   
  arch/powerpc/include/asm/kvm_asm.h|  8 
  arch/powerpc/kvm/booke.c  | 17 +
  arch/powerpc/kvm/booke.h  |  4 ++--
  arch/powerpc/kvm/booke_interrupts.S   |  9 +
  arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
  arch/powerpc/kvm/e500.c   | 10 ++
  arch/powerpc/kvm/e500_emulate.c   | 10 ++
  7 files changed, 30 insertions(+), 32 deletions(-)
   
diff --git a/arch/powerpc/include/asm/kvm_asm.h
   b/arch/powerpc/include/asm/kvm_asm.h
index 9601741..c94fd33 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -56,14 +56,6 @@
  /* E500 */
  #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
  #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
   defines
- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL
   BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
-#define BOOKE_INTERRUPT_SPE_FP_DATA
   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
   BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   
BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
   
   I think I'd prefer to keep them separate.
  
  What is the reason from changing your mind from ver 1? Do you want to have
  different defines with same values (we specifically mapped them to the
  hardware interrupt numbers). We already upstreamed the necessary changes
  in the kernel. Scott, please share your opinion here.
 
 I don't like hiding the fact that they're the same number, which could
 lead to wrong code in the absence of ifdefs that strictly mutually
 exclude SPE and Altivec code -- there was an instance of this with
 MSR_VEC versus MSR_SPE in a previous patchset. 

That said, if you want to enforce that mutual exclusion in a way that is
clear, I won't object too loudly -- but the code does look pretty
similar between the two (as well as between the two IVORs).

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Scott Wood
On Fri, 2014-07-04 at 00:35 +0200, Alexander Graf wrote:
 On 04.07.14 00:31, Scott Wood wrote:
  On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote:
  On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, July 03, 2014 3:21 PM
  To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org
  Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org
  Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
  SPE/FP/AltiVec int numbers
 
 
  On 30.06.14 17:34, Mihai Caraman wrote:
  Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
  which share the same interrupt numbers.
 
  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
  ---
  v2:
 - remove outdated definitions
 
 arch/powerpc/include/asm/kvm_asm.h|  8 
 arch/powerpc/kvm/booke.c  | 17 +
 arch/powerpc/kvm/booke.h  |  4 ++--
 arch/powerpc/kvm/booke_interrupts.S   |  9 +
 arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
 arch/powerpc/kvm/e500.c   | 10 ++
 arch/powerpc/kvm/e500_emulate.c   | 10 ++
 7 files changed, 30 insertions(+), 32 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_asm.h
  b/arch/powerpc/include/asm/kvm_asm.h
  index 9601741..c94fd33 100644
  --- a/arch/powerpc/include/asm/kvm_asm.h
  +++ b/arch/powerpc/include/asm/kvm_asm.h
  @@ -56,14 +56,6 @@
 /* E500 */
 #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
  -/*
  - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
  defines
  - */
  -#define BOOKE_INTERRUPT_SPE_UNAVAIL
  BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
  -#define BOOKE_INTERRUPT_SPE_FP_DATA
  BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
  -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
  BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
  -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
  -   
  BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
  I think I'd prefer to keep them separate.
  What is the reason from changing your mind from ver 1? Do you want to have
  different defines with same values (we specifically mapped them to the
  hardware interrupt numbers). We already upstreamed the necessary changes
  in the kernel. Scott, please share your opinion here.
  I don't like hiding the fact that they're the same number, which could
  lead to wrong code in the absence of ifdefs that strictly mutually
  exclude SPE and Altivec code -- there was an instance of this with
  MSR_VEC versus MSR_SPE in a previous patchset.
  That said, if you want to enforce that mutual exclusion in a way that is
  clear, I won't object too loudly -- but the code does look pretty
  similar between the two (as well as between the two IVORs).
 
 Yes, I want to make sure we have 2 separate code paths for SPE and 
 Altivec. No code sharing at all unless it's very generically possible.
 
 Also, which code does look pretty similar? The fact that we deflect 
 interrupts back into the guest? That's mostly boilerplate.

There's also the injection of a program check (or exiting to userspace)
when CONFIG_SPE/ALTIVEC is missing.  Not a big deal, but maybe it could
be factored into a helper function.  I like minimizing boilerplate.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6 v2] KVM: PPC: Book3E: Add AltiVec support

2014-07-03 Thread Scott Wood
On Mon, 2014-06-30 at 18:34 +0300, Mihai Caraman wrote:
 Add KVM Book3E AltiVec support. KVM Book3E FPU support gracefully reuse host
 infrastructure so follow the same approach for AltiVec.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 v2:
  - integrate Paul's FP/VMX/VSX changes
 
  arch/powerpc/kvm/booke.c | 67 
 ++--
  1 file changed, 65 insertions(+), 2 deletions(-)

I had to apply the whole patchset to get proper context for reviewing
this, and found some nits:

 case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
 if (kvmppc_supports_spe() || kvmppc_supports_altivec()) {
 kvmppc_booke_queue_irqprio(vcpu,
 BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
 r = RESUME_GUEST;
 } else { 
 /*
  * These really should never happen without 
 CONFIG_SPE,
  * as we should never enable the real MSR[SPE] in the
  * guest.
  */

Besides the comment not being updated for Altivec, it's not true on HV,
where the guest can enable MSR[VEC] all by itself.  For HV, the reason
we shouldn't be able to get here is that we disable KVM on e6500 if
CONFIG_ALTIVEC is not enabled, and no other HV core supports either SPE
or Altivec.

 pr_crit(%s: unexpected SPE interrupt %u at %08lx\n,
 __func__, exit_nr, vcpu-arch.pc);

Error string will say SPE regardless of what sort of chip you're on.
Given that this is explicitly on the no support for Altivec or SPE
path, SPE/Altivec phrasing seems appropriate.  Of course we have
bigger problems than that if we ever reach this code. :-)

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest

2014-07-01 Thread Scott Wood
On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote:
 
  Am 30.06.2014 um 22:25 schrieb Scott Wood scottw...@freescale.com:
  
  On Sun, 2014-06-29 at 23:38 -0500, Bhushan Bharat-R65777 wrote:
  
  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, June 27, 2014 11:53 PM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org
  Subject: Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt 
  injection to
  guest
  
  On Fri, 2014-06-27 at 11:55 +0530, Bharat Bhushan wrote:
  -/* Force enable debug interrupts when user space wants to debug */
  -if (vcpu-guest_debug) {
  +/*
  + * Force enable debug interrupts when user space wants to debug
  + * and there is no debug interrupt pending for guest to handle.
  + */
  +if (vcpu-guest_debug  !kvmppc_core_pending_debug(vcpu)) {
  
  Are you trying to allow the guest to be simultaneously debugged by itself 
  and by
  host userspace?  How does this work?
  
  Not actually, Currently we are not partitioning debug resources between
  host userspace and guest. In fact we do not emulate debug registers for
  guest. But we want host userspace to pass the interrupt to guest if it
  is not able to handle.
  
  I don't understand the logic here.  A debug interrupt should be injected
  when the programming model in the guest says that a debug interrupt
  should happen.  How can that occur currently?  If the guest didn't set
  up the debug registers and QEMU still can't handle the debug interrupt,
  that's a bug in QEMU (or KVM, or the hardware...).  Injecting the
  interrupt into the guest just adds another bug on top of that.
 
 I don't think QEMU should be aware of these limitations.

OK, but we should at least have some idea of how the whole thing is
supposed to work, in order to determine if this is the correct behavior
for QEMU.  I thought the model was that debug resources are either owned
by QEMU or by the guest, and in the latter case, QEMU would never see
the debug exception to begin with.

  one reg?
  
  We are using SREGS but if required we can use one_reg.
  
  I thought we were preferring one reg over sregs for new functionality.
 
 I'm personally torn on this one. The problem here is that the sregs
 fields and values are already reserved. For anything we don't have an
 API for yet, yes, one_reg only. IIUC we have the API here, but were
 lacking the implementation.

From a QEMU perspective, are we going to want to update this at the same
time as everything else, or would jumping through sregs hoops be a
nuisance?  Is the long term goal to have one reg support for everything?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest

2014-07-01 Thread Scott Wood
On Tue, 2014-07-01 at 17:04 +0200, Alexander Graf wrote:
 On 01.07.14 16:58, Scott Wood wrote:
  On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote:
  I don't think QEMU should be aware of these limitations.
  OK, but we should at least have some idea of how the whole thing is
  supposed to work, in order to determine if this is the correct behavior
  for QEMU.  I thought the model was that debug resources are either owned
  by QEMU or by the guest, and in the latter case, QEMU would never see
  the debug exception to begin with.
 
 That's bad for a number of reasons. For starters it's different from how 
 x86 handles debug registers - and I hate to be different just for the 
 sake of being different.

How does it work on x86?

 So if we do want to declare that debug registers are owned by either
 QEMU or the guest, we should change the semantics for all
 architectures.

If we want to say that ownership of the registers is shared, we need a
plan for how that would actually work.

 The second problem I see is that we're disabling use cases for the sake 
 of correctness. When I use gdbstub, I usually don't want anything in the 
 way of debugging something - like if I want to debug the guest debugging 
 itself for example.

I don't have a problem with making it possible for userspace to forcibly
claim ownership of the debug registers -- but I don't want to advertise
that won't mess up the guest debugging that you're trying to debug,
without some reason to believe that.

 So overall, I think the x86 approach is a nice middle ground between 
 usability, performance and functionality.

You mean don't document anything other than the mechanism to get/set the
registers and hope everything works out? :-)

  one reg?
  We are using SREGS but if required we can use one_reg.
  I thought we were preferring one reg over sregs for new functionality.
  I'm personally torn on this one. The problem here is that the sregs
  fields and values are already reserved. For anything we don't have an
  API for yet, yes, one_reg only. IIUC we have the API here, but were
  lacking the implementation.
   From a QEMU perspective, are we going to want to update this at the same
  time as everything else, or would jumping through sregs hoops be a
  nuisance?  Is the long term goal to have one reg support for everything?
 
 Because we need to maintain backwards compatibility, I don't think we'll 
 actually have any benefit from one reg support for everything.

We don't need to maintain backwards compatibility with interfaces that
were never implemented, and even where we do need to maintain
compatibility, that doesn't mean the caller needs to use the old,
cumbersome interface (don't assume the caller is QEMU, or that all new
features of QEMU need to support old kernels).

 I think we're in a path that is slow enough already to not worry about 
 performance.

It's not just about performance, but simplicity of use, and consistency
of API.

Oh, and it looks like there already exist one reg definitions and
implementations for most of the debug registers.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest

2014-07-01 Thread Scott Wood
On Tue, 2014-07-01 at 18:22 +0200, Alexander Graf wrote:
 On 01.07.14 17:35, Scott Wood wrote:
  On Tue, 2014-07-01 at 17:04 +0200, Alexander Graf wrote:
  On 01.07.14 16:58, Scott Wood wrote:
  On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote:
  I don't think QEMU should be aware of these limitations.
  OK, but we should at least have some idea of how the whole thing is
  supposed to work, in order to determine if this is the correct behavior
  for QEMU.  I thought the model was that debug resources are either owned
  by QEMU or by the guest, and in the latter case, QEMU would never see
  the debug exception to begin with.
  That's bad for a number of reasons. For starters it's different from how
  x86 handles debug registers - and I hate to be different just for the
  sake of being different.
  How does it work on x86?
 
 It overwrites more-or-less random breakpoints with its own ones, but 
 leaves the others intact ;).

Are you talking about software breakpoints or management of hardware
debug registers?

  So if we do want to declare that debug registers are owned by either
  QEMU or the guest, we should change the semantics for all
  architectures.
  If we want to say that ownership of the registers is shared, we need a
  plan for how that would actually work.
 
 I think you're overengineering here :). When do people actually use 
 gdbstub? Usually when they want to debug a broken guest. We can either
 
* overengineer heavily and reduce the number of registers available 
 to the guest to always have spares
* overengineer a bit and turn off guest debugging completely when we 
 use gdbstub
* just leave as much alive as we can, hoping that it helps with the 
 debugging
 
 Option 3 is what x86 does - and I think it's a reasonable approach. This 
 is not an interface that needs to be 100% consistent and bullet proof, 
 it's a best effort to enable you to debug as much as possible.

I'm not insisting on 100% -- just hoping for some explanation/discussion
about how it's intended to work for the cases where it can.

How will MSR[DE] and MSRP[DEP] be handled?

How would I go about telling QEMU/KVM that I don't want this shared
mode, because I don't want guest to interfere with the debugging I'm
trying to do from QEMU?

Will guest accesses to debug registers cause a userspace exit when
guest_debug is enabled?

  I think we're in a path that is slow enough already to not worry about
  performance.
  It's not just about performance, but simplicity of use, and consistency
  of API.
 
  Oh, and it looks like there already exist one reg definitions and
  implementations for most of the debug registers.
 
 For BookE? Where?

arch/powerpc/kvm/booke.c: KVM_REG_PPC_IACn, KVM_REG_PPC_DACn

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: Fix default tlb for victim hint

2014-06-30 Thread Scott Wood
On Mon, 2014-06-30 at 15:54 +0300, Mihai Caraman wrote:
 Tlb search operation used for victim hint relies on the default tlb set by the
 host. When hardware tablewalk support is enabled in the host, the default tlb 
 is
 TLB1 which leads KVM to evict the bolted entry. Set and restore the default 
 tlb
 when searching for victim hint.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/include/asm/mmu-book3e.h | 5 -
  arch/powerpc/kvm/e500_mmu_host.c  | 4 
  2 files changed, 8 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
 b/arch/powerpc/include/asm/mmu-book3e.h
 index 901dac6..5dad378 100644
 --- a/arch/powerpc/include/asm/mmu-book3e.h
 +++ b/arch/powerpc/include/asm/mmu-book3e.h
 @@ -40,7 +40,9 @@
  
  /* MAS registers bit definitions */
  
 -#define MAS0_TLBSEL(x)   (((x)  28)  0x3000)
 +#define MAS0_TLBSEL_MASK0x3000
 +#define MAS0_TLBSEL_SHIFT   28
 +#define MAS0_TLBSEL(x)  (((x)  MAS0_TLBSEL_SHIFT)  
 MAS0_TLBSEL_MASK)
  #define MAS0_ESEL_MASK   0x0FFF
  #define MAS0_ESEL_SHIFT  16
  #define MAS0_ESEL(x) (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
 @@ -86,6 +88,7 @@
  #define MAS3_SPSIZE  0x003e
  #define MAS3_SPSIZE_SHIFT1
  
 +#define MAS4_TLBSEL_MASK MAS0_TLBSEL_MASK
  #define MAS4_TLBSELD(x)  MAS0_TLBSEL(x)
  #define MAS4_INDD0x8000  /* Default IND */
  #define MAS4_TSIZED(x)   MAS1_TSIZE(x)
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
 b/arch/powerpc/kvm/e500_mmu_host.c
 index dd2cc03..79677d7 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -107,11 +107,15 @@ static u32 get_host_mas0(unsigned long eaddr)
  {
   unsigned long flags;
   u32 mas0;
 + u32 mas4;
  
   local_irq_save(flags);
   mtspr(SPRN_MAS6, 0);
 + mas4 = mfspr(SPRN_MAS4);
 + mtspr(SPRN_MAS4, mas4  ~MAS4_TLBSEL_MASK);
   asm volatile(tlbsx 0, %0 : : b (eaddr  ~CONFIG_PAGE_OFFSET));
   mas0 = mfspr(SPRN_MAS0);
 + mtspr(SPRN_MAS4, mas4);
   local_irq_restore(flags);
  
   return mas0;

Reviewed-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: Emulate power management control SPR

2014-06-30 Thread Scott Wood
On Mon, 2014-06-30 at 15:55 +0300, Mihai Caraman wrote:
 For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
 to enable idle power down for cores and devices by setting up the idle count
 period at boot time. With the host already controlling the power management
 configuration the guest could simply benefit from it, so emulate guest request
 as nop.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kvm/e500_emulate.c | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
 index 002d517..98a22e5 100644
 --- a/arch/powerpc/kvm/e500_emulate.c
 +++ b/arch/powerpc/kvm/e500_emulate.c
 @@ -250,6 +250,10 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu 
 *vcpu, int sprn, ulong spr_va
   spr_val);
   break;
  
 + case SPRN_PWRMGTCR0:
 + /* Guest relies on host power management configurations */
 + break;
 +
   /* extra exceptions */
   case SPRN_IVOR32:
   vcpu-arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL] = spr_val;
 @@ -355,6 +359,10 @@ int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu 
 *vcpu, int sprn, ulong *spr_v
   *spr_val = 0;
   break;
  
 + case SPRN_PWRMGTCR0:
 + *spr_val = 0;
 + break;
 +
   case SPRN_MMUCFG:
   *spr_val = vcpu-arch.mmucfg;
   break;

When reading, is it better to return zero, or the current host value, or
the value last written by the guest (even though it wasn't written to
hardware)?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest

2014-06-30 Thread Scott Wood
On Sun, 2014-06-29 at 23:38 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, June 27, 2014 11:53 PM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org
  Subject: Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt 
  injection to
  guest
  
  On Fri, 2014-06-27 at 11:55 +0530, Bharat Bhushan wrote:
   - /* Force enable debug interrupts when user space wants to debug */
   - if (vcpu-guest_debug) {
   + /*
   +  * Force enable debug interrupts when user space wants to debug
   +  * and there is no debug interrupt pending for guest to handle.
   +  */
   + if (vcpu-guest_debug  !kvmppc_core_pending_debug(vcpu)) {
  
  Are you trying to allow the guest to be simultaneously debugged by itself 
  and by
  host userspace?  How does this work?
 
 Not actually, Currently we are not partitioning debug resources between
 host userspace and guest. In fact we do not emulate debug registers for
 guest. But we want host userspace to pass the interrupt to guest if it
 is not able to handle. 

I don't understand the logic here.  A debug interrupt should be injected
when the programming model in the guest says that a debug interrupt
should happen.  How can that occur currently?  If the guest didn't set
up the debug registers and QEMU still can't handle the debug interrupt,
that's a bug in QEMU (or KVM, or the hardware...).  Injecting the
interrupt into the guest just adds another bug on top of that.

#ifdef CONFIG_KVM_BOOKE_HV
 /*
  * Since there is no shadow MSR, sync MSR_DE into the guest @@
   -264,6 +272,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu
  *vcpu)
 clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions);
   }
  
   +static void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) {
   + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); }
   +
   +static void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) {
   + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions); }
  
  Is there currently no support for a guest debugging itself (i.e.
  guest_debug unset) on e500v2?
 
 Yes, It is not yet supported (IACx/DACx/DBCR/DBSR/DSRRx are not yet 
 emulated). 

How is it useful to inject a debug exception into the guest, until these
things are emulated?

   @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
 if (sregs-u.e.update_special  KVM_SREGS_E_UPDATE_TSR)
 kvmppc_set_tsr(vcpu, sregs-u.e.tsr);
  
   + if (sregs-u.e.update_special  KVM_SREGS_E_UPDATE_DBSR) {
   + vcpu-arch.dbsr = sregs-u.e.dbsr;
   + if (vcpu-arch.dbsr)
   + kvmppc_core_queue_debug(vcpu);
   + else
   + kvmppc_core_dequeue_debug(vcpu);
   + }
   +
 return 0;
}
  
  one reg?
 
 We are using SREGS but if required we can use one_reg.

I thought we were preferring one reg over sregs for new functionality.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest

2014-06-27 Thread Scott Wood
On Fri, 2014-06-27 at 11:55 +0530, Bharat Bhushan wrote:
 This patch allow userspace to inject debug interrupt to guest.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com

Could you describe how userspace plans to make use of this, and go into
more detail about the changes you're making?

 ---
  arch/powerpc/kvm/booke.c  | 31 +--
  arch/powerpc/kvm/e500mc.c | 10 +-
  2 files changed, 38 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index bb25937..63ac38c 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -135,6 +135,11 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
  #endif
  }
  
 +static int kvmppc_core_pending_debug(struct kvm_vcpu *vcpu)
 +{
 + return test_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions);
 +}
 +
  static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
  {
   /* Synchronize guest's desire to get debug interrupts into shadow MSR */
 @@ -143,8 +148,11 @@ static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE;
  #endif
  
 - /* Force enable debug interrupts when user space wants to debug */
 - if (vcpu-guest_debug) {
 + /*
 +  * Force enable debug interrupts when user space wants to debug
 +  * and there is no debug interrupt pending for guest to handle.
 +  */
 + if (vcpu-guest_debug  !kvmppc_core_pending_debug(vcpu)) {

Are you trying to allow the guest to be simultaneously debugged by
itself and by host userspace?  How does this work?

  #ifdef CONFIG_KVM_BOOKE_HV
   /*
* Since there is no shadow MSR, sync MSR_DE into the guest
 @@ -264,6 +272,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu 
 *vcpu)
   clear_bit(BOOKE_IRQPRIO_WATCHDOG, vcpu-arch.pending_exceptions);
  }
  
 +static void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu)
 +{
 + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG);
 +}
 +
 +static void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu)
 +{
 + clear_bit(BOOKE_IRQPRIO_DEBUG, vcpu-arch.pending_exceptions);
 +}

Is there currently no support for a guest debugging itself (i.e.
guest_debug unset) on e500v2?

  static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 
 srr1)
  {
  #ifdef CONFIG_KVM_BOOKE_HV
 @@ -1332,6 +1350,7 @@ static void get_sregs_base(struct kvm_vcpu *vcpu,
   sregs-u.e.dec = kvmppc_get_dec(vcpu, tb);
   sregs-u.e.tb = tb;
   sregs-u.e.vrsave = vcpu-arch.vrsave;
 + sregs-u.e.dbsr = vcpu-arch.dbsr;
  }
  
  static int set_sregs_base(struct kvm_vcpu *vcpu,
 @@ -1356,6 +1375,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
   if (sregs-u.e.update_special  KVM_SREGS_E_UPDATE_TSR)
   kvmppc_set_tsr(vcpu, sregs-u.e.tsr);
  
 + if (sregs-u.e.update_special  KVM_SREGS_E_UPDATE_DBSR) {
 + vcpu-arch.dbsr = sregs-u.e.dbsr;
 + if (vcpu-arch.dbsr)
 + kvmppc_core_queue_debug(vcpu);
 + else
 + kvmppc_core_dequeue_debug(vcpu);
 + }
 +
   return 0;
  }

one reg?
 
 diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
 index 17e4562..ea724f2 100644
 --- a/arch/powerpc/kvm/e500mc.c
 +++ b/arch/powerpc/kvm/e500mc.c
 @@ -212,7 +212,7 @@ static int kvmppc_core_get_sregs_e500mc(struct kvm_vcpu 
 *vcpu,
   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
  
   sregs-u.e.features |= KVM_SREGS_E_ARCH206_MMU | KVM_SREGS_E_PM |
 -KVM_SREGS_E_PC;
 +KVM_SREGS_E_PC | KVM_SREGS_E_ED;

   sregs-u.e.impl_id = KVM_SREGS_E_IMPL_FSL;
  
   sregs-u.e.impl.fsl.features = 0;
 @@ -220,6 +220,9 @@ static int kvmppc_core_get_sregs_e500mc(struct kvm_vcpu 
 *vcpu,
   sregs-u.e.impl.fsl.hid0 = vcpu_e500-hid0;
   sregs-u.e.impl.fsl.mcar = vcpu_e500-mcar;
  
 + sregs-u.e.dsrr0 = vcpu-arch.dsrr0;
 + sregs-u.e.dsrr1 = vcpu-arch.dsrr1;
 +
   kvmppc_get_sregs_e500_tlb(vcpu, sregs);
  
   sregs-u.e.ivor_high[3] =
 @@ -261,6 +264,11 @@ static int kvmppc_core_set_sregs_e500mc(struct kvm_vcpu 
 *vcpu,
   sregs-u.e.ivor_high[5];
   }
  
 + if (sregs-u.e.features  KVM_SREGS_E_ED) {
 + vcpu-arch.dsrr0 = sregs-u.e.dsrr0;
 + vcpu-arch.dsrr1 = sregs-u.e.dsrr1;
 + }

SPRG9?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code

2014-06-24 Thread Scott Wood
On Sun, 2014-06-22 at 23:23 +0200, Alexander Graf wrote:
 Howdy,
 
 Ben reminded me a while back that we have a nasty race in our KVM PV code.
 
 We replace a few instructions with longer streams of instructions to check
 whether it's necessary to trap out from it (like mtmsr, no need to trap if
 we only disable interrupts). During those replacement chunks we must not get
 any interrupts, because they might overwrite scratch space that we already
 used to save otherwise clobbered register state into.
 
 So we have a thing called critical sections which allows us to atomically
 get in and out of interrupt disabled modes without touching MSR. When we
 are supposed to deliver an interrupt into the guest while we are in a critical
 section, we just don't inject the interrupt yet, but leave it be until the
 next trap.
 
 However, we never really know when the next trap would be. For all we know it
 could be never. At this point we created a race that is a potential source
 for interrupt loss or at least deferral.
 
 This patch set aims at solving the race. Instead of merely deferring an
 interrupt when we see such a situation, we go into a special instruction
 interpretation mode. In this mode, we interpret all PPC assembler instructions
 that happen until we are out of the critical section again, at which point
 we can now inject the interrupt.
 
 This bug only affects KVM implementations that make use of the magic page, so
 e500v2, book3s_32 and book3s_64 PR KVM.

Would it be possible to single step through the critical section
instead?  Or set a high res timer to expire very quickly?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code

2014-06-24 Thread Scott Wood
On Wed, 2014-06-25 at 00:41 +0200, Alexander Graf wrote:
 On 24.06.14 20:53, Scott Wood wrote:
  On Sun, 2014-06-22 at 23:23 +0200, Alexander Graf wrote:
  Howdy,
 
  Ben reminded me a while back that we have a nasty race in our KVM PV code.
 
  We replace a few instructions with longer streams of instructions to check
  whether it's necessary to trap out from it (like mtmsr, no need to trap if
  we only disable interrupts). During those replacement chunks we must not 
  get
  any interrupts, because they might overwrite scratch space that we already
  used to save otherwise clobbered register state into.
 
  So we have a thing called critical sections which allows us to atomically
  get in and out of interrupt disabled modes without touching MSR. When we
  are supposed to deliver an interrupt into the guest while we are in a 
  critical
  section, we just don't inject the interrupt yet, but leave it be until the
  next trap.
 
  However, we never really know when the next trap would be. For all we know 
  it
  could be never. At this point we created a race that is a potential source
  for interrupt loss or at least deferral.
 
  This patch set aims at solving the race. Instead of merely deferring an
  interrupt when we see such a situation, we go into a special instruction
  interpretation mode. In this mode, we interpret all PPC assembler 
  instructions
  that happen until we are out of the critical section again, at which point
  we can now inject the interrupt.
 
  This bug only affects KVM implementations that make use of the magic page, 
  so
  e500v2, book3s_32 and book3s_64 PR KVM.
  Would it be possible to single step through the critical section
  instead?  Or set a high res timer to expire very quickly?
 
 There are a few other alternatives to this implementation:
 
1) Unmap the magic page, emulate all memory access to it while in 
 critical and irq pending
2) Trigger a timer that sends a request to the vcpu to wake it from 
 potential sleep and inject the irq
3) Single step until we're beyond the critical section
4) Probably more that I can't think of right now :)
 
 Each has their good and bad sides. Unmapping the magic page adds 
 complexity to the MMU mapping code, since we need to make sure we don't 
 map it back in on demand and treat faults to it specially.
 
 The timer interrupt works, but I'm not fully convinced that it's a good 
 idea for things like MC events which we also block during critical 
 sections on e500v2.

Are you concerned about the guest seeing machine checks that are (more)
asynchronous with the error condition?  e500v2 machine checks are always
asynchronous.  From the core manual:

Machine check interrupts are typically caused by a hardware or
memory subsystem failure or by an attempt to access an invalid
address. They may be caused indirectly by execution of an
instruction, but may not be recognized or reported until long
after the processor has executed past the instruction that
caused the machine check. As such, machine check interrupts are
not thought of as synchronous or asynchronous nor as precise or
imprecise.

I don't think the lag would be a problem, and certainly it's better than
the current situation.

 Single stepping is hard enough to get right on interaction between QEMU, 
 KVM and the guest. I didn't really want to make that stuff any more 
 complicated.

I'm not sure that it would add much complexity.  We'd just need to check
whether any source other than the magic page turned wants DCBR0_IC on,
to determine whether to exit to userspace or not.

 This approach is really just one out of many - and it's one that's 
 nicely self-contained and shouldn't have any impact at all on 
 implementations that don't care about it ;).

Nicely self-contained is not a phrase I'd associate with 33 patches,
including a bunch of new emulation that probably isn't getting great
test coverage.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code

2014-06-24 Thread Scott Wood
On Wed, 2014-06-25 at 01:40 +0200, Alexander Graf wrote:
 On 25.06.14 01:15, Scott Wood wrote:
  On Wed, 2014-06-25 at 00:41 +0200, Alexander Graf wrote:
  On 24.06.14 20:53, Scott Wood wrote:
  The timer interrupt works, but I'm not fully convinced that it's a good
  idea for things like MC events which we also block during critical
  sections on e500v2.
  Are you concerned about the guest seeing machine checks that are (more)
  asynchronous with the error condition?  e500v2 machine checks are always
  asynchronous.  From the core manual:
 
   Machine check interrupts are typically caused by a hardware or
   memory subsystem failure or by an attempt to access an invalid
   address. They may be caused indirectly by execution of an
   instruction, but may not be recognized or reported until long
   after the processor has executed past the instruction that
   caused the machine check. As such, machine check interrupts are
   not thought of as synchronous or asynchronous nor as precise or
   imprecise.
 
  I don't think the lag would be a problem, and certainly it's better than
  the current situation.
 
 So what value would you set the timer to? If the value is too small, we 
 never finish the critical section. If it's too big, we add lots of jitter.

Maybe something like 100us?

Single stepping would be better, though.

  Single stepping is hard enough to get right on interaction between QEMU,
  KVM and the guest. I didn't really want to make that stuff any more
  complicated.
  I'm not sure that it would add much complexity.  We'd just need to check
  whether any source other than the magic page turned wants DCBR0_IC on,
  to determine whether to exit to userspace or not.
 
 What if the guest is single stepping itself? How do we determine when to 
 unset the bit again? When we get out of the critical section? How do we 
 know what the value was before we set it?

Keep track of each requester of single stepping separately, and only
ever set the real bit by ORing them.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

2014-06-18 Thread Scott Wood
On Wed, 2014-06-18 at 10:15 +0300, Mihai Caraman wrote:
 On vcpu schedule, the condition checked for tlb pollution is too loose.
 The tlb entries of a vcpu become polluted (vs stale) only when a different
 vcpu within the same logical partition runs in-between. Optimize the tlb
 invalidation condition keeping last_vcpu per logical partition id.
 
 With the new invalidation condition, a guest shows 4% performance improvement
 on P5020DS while running a memory stress application with the cpu 
 oversubscribed,
 the other guest running a cpu intensive workload.
 
 Guest - old invalidation condition
   real 3.89
   user 3.87
   sys 0.01
 
 Guest - enhanced invalidation condition
   real 3.75
   user 3.73
   sys 0.01
 
 Host
   real 3.70
   user 1.85
   sys 0.00
 
 The memory stress application accesses 4KB pages backed by 75% of available
 TLB0 entries:
 
 char foo[ENTRIES][4096] __attribute__ ((aligned (4096)));
 
 int main()
 {
   char bar;
   int i, j;
 
   for (i = 0; i  ITERATIONS; i++)
   for (j = 0; j  ENTRIES; j++)
   bar = foo[j][0];
 
   return 0;
 }
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 ---
 v4:
  - rename last_vcpu_on_cpu to last_vcpu_of_lpid
  - use *[ syntax despite checkpatch error
  
 v3:
  - use existing logic while keeping last_cpu per lpid
 
  arch/powerpc/kvm/e500mc.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
 index 17e4562..690499d 100644
 --- a/arch/powerpc/kvm/e500mc.c
 +++ b/arch/powerpc/kvm/e500mc.c
 @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 
 old_msr)
  {
  }
  
 -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu);
 +static DEFINE_PER_CPU(struct kvm_vcpu *[KVMPPC_NR_LPIDS], last_vcpu_of_lpid);
  
  static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu)
  {
 @@ -141,9 +141,9 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu 
 *vcpu, int cpu)
   mtspr(SPRN_GESR, vcpu-arch.shared-esr);
  
   if (vcpu-arch.oldpir != mfspr(SPRN_PIR) ||
 - __get_cpu_var(last_vcpu_on_cpu) != vcpu) {
 + __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] != vcpu) {
   kvmppc_e500_tlbil_all(vcpu_e500);
 - __get_cpu_var(last_vcpu_on_cpu) = vcpu;
 + __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu;
   }
  
   kvmppc_load_guest_fp(vcpu);

Reviewed-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

2014-06-17 Thread Scott Wood
On Tue, 2014-06-17 at 14:04 -0500, Caraman Mihai Claudiu-B02008 wrote:
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, June 17, 2014 6:36 PM
  To: Caraman Mihai Claudiu-B02008
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
  d...@lists.ozlabs.org
  Subject: Re: [PATCH v2] KVM: PPC: e500mc: Enhance tlb invalidation
  condition on vcpu schedule
  
  On Tue, 2014-06-17 at 12:02 +0300, Mihai Caraman wrote:
   On vcpu schedule, the condition checked for tlb pollution is too loose.
   The tlb entries of a vcpu become polluted (vs stale) only when a
  different
   vcpu within the same logical partition runs in-between. Optimize the
  tlb
   invalidation condition taking into account the logical partition id.
  
   With the new invalidation condition, a guest shows 4% performance
  improvement
   on P5020DS while running a memory stress application with the cpu
  oversubscribed,
   the other guest running a cpu intensive workload.
  
  See
  https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118547.html
 
 Thanks. The original code needs just a simple adjustment to benefit from
 this optimization, please review v3.

Where is v3?  Or is it forthcoming?

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

2014-06-17 Thread Scott Wood
On Tue, 2014-06-17 at 22:09 +0300, Mihai Caraman wrote:
 On vcpu schedule, the condition checked for tlb pollution is too loose.
 The tlb entries of a vcpu become polluted (vs stale) only when a different
 vcpu within the same logical partition runs in-between. Optimize the tlb
 invalidation condition keeping last_vcpu_on_cpu per logical partition id.
 
 With the new invalidation condition, a guest shows 4% performance improvement
 on P5020DS while running a memory stress application with the cpu 
 oversubscribed,
 the other guest running a cpu intensive workload.
 
 Guest - old invalidation condition
   real 3.89
   user 3.87
   sys 0.01
 
 Guest - enhanced invalidation condition
   real 3.75
   user 3.73
   sys 0.01
 
 Host
   real 3.70
   user 1.85
   sys 0.00
 
 The memory stress application accesses 4KB pages backed by 75% of available
 TLB0 entries:
 
 char foo[ENTRIES][4096] __attribute__ ((aligned (4096)));
 
 int main()
 {
   char bar;
   int i, j;
 
   for (i = 0; i  ITERATIONS; i++)
   for (j = 0; j  ENTRIES; j++)
   bar = foo[j][0];
 
   return 0;
 }
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 ---
 v3:
  - use existing logic while keeping last_vcpu_per_cpu per lpid
  
 v2:
  - improve patch name and description
  - add performance results
 
 
  arch/powerpc/kvm/e500mc.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
 index 17e4562..95e33e3 100644
 --- a/arch/powerpc/kvm/e500mc.c
 +++ b/arch/powerpc/kvm/e500mc.c
 @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 
 old_msr)
  {
  }
  
 -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu);
 +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS], last_vcpu_on_cpu);

Hmm, I didn't know you could express types like that.  Is this special
syntax that only works for typeof?

No space after *

Name should be adjusted to match, something like last_vcpu_of_lpid (with
the _on_cpu being implied by the fact that it's PER_CPU).

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

2014-06-17 Thread Scott Wood
On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote:
   -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu);
   +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS],
  last_vcpu_on_cpu);
  
  Hmm, I didn't know you could express types like that.  Is this special
  syntax that only works for typeof?
 
 Yes, AFAIK.
 
  No space after *
 
 Checkpatch complains about the missing space ;)

Checkpatch is wrong, which isn't surprising given that this is unusual
syntax.  We don't normally put a space after * when used to represent a
pointer.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule

2014-06-17 Thread Scott Wood
On Tue, 2014-06-17 at 15:02 -0500, Caraman Mihai Claudiu-B02008 wrote:
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, June 17, 2014 10:48 PM
  To: Caraman Mihai Claudiu-B02008
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
  d...@lists.ozlabs.org
  Subject: Re: [PATCH v3] KVM: PPC: e500mc: Enhance tlb invalidation
  condition on vcpu schedule
  
  On Tue, 2014-06-17 at 14:42 -0500, Caraman Mihai Claudiu-B02008 wrote:
 -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu);
 +static DEFINE_PER_CPU(struct kvm_vcpu * [KVMPPC_NR_LPIDS],
last_vcpu_on_cpu);
   
Hmm, I didn't know you could express types like that.  Is this
  special
syntax that only works for typeof?
  
   Yes, AFAIK.
  
No space after *
  
   Checkpatch complains about the missing space ;)
  
  Checkpatch is wrong, which isn't surprising given that this is unusual
  syntax.  We don't normally put a space after * when used to represent a
  pointer.
 
 This is not something new. See [PATCH 04/10] percpu: cleanup percpu array
 definitions:
 
   https://lkml.org/lkml/2009/6/24/26

I didn't say it was new, just unusual, and checkpatch doesn't recognize
it.  Checkpatch shouldn't be blindly and silently obeyed when it says
something strange.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

2014-06-13 Thread Scott Wood
On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote:
 On 13.06.14 16:43, mihai.cara...@freescale.com wrote:
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, June 12, 2014 8:05 PM
  To: Caraman Mihai Claudiu-B02008
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
  d...@lists.ozlabs.org; Wood Scott-B07421
  Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
  on vcpu schedule
 
  On 06/12/2014 04:00 PM, Mihai Caraman wrote:
  On vcpu schedule, the condition checked for tlb pollution is too tight.
  The tlb entries of one vcpu are polluted when a different vcpu from the
  same partition runs in-between. Relax the current tlb invalidation
  condition taking into account the lpid.

Can you quantify the performance improvement from this?  We've had bugs
in this area before, so let's make sure it's worth it before making this
more complicated.

  Signed-off-by: Mihai Caraman mihai.caraman at freescale.com
  Your mailer is broken? :)
  This really should be an @.
 
  I think this should work. Scott, please ack.
  Alex, you were right. I screwed up the patch description by inverting relax
  and tight terms :) It should have been more like this:
 
  KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
 
  On vcpu schedule, the condition checked for tlb pollution is too loose.
  The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu
  within the same logical partition runs in-between. Optimize the tlb 
  invalidation
  condition taking into account the lpid.
 
 Can't we give every vcpu its own lpid? Or don't we trap on global 
 invalidates?

That would significantly increase the odds of exhausting LPIDs,
especially on large chips like t4240 with similarly large VMs.  If we
were to do that, the LPIDs would need to be dynamically assigned (like
PIDs), and should probably be a separate numberspace per physical core.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 v3] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-06-13 Thread Scott Wood
On Thu, 2014-06-12 at 18:04 +0200, Alexander Graf wrote:
 On 06/02/2014 05:50 PM, Mihai Caraman wrote:
  On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
  guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
  and LRAT), generated by loading a guest address, needs to be handled by KVM.
  These exceptions are generated in a substituted guest translation context
  (EPLC[EGS] = 1) from host context (MSR[GS] = 0).
 
  Currently, KVM hooks only interrupts generated from guest context (MSR[GS] 
  = 1),
  doing minimal checks on the fast path to avoid host performance degradation.
  lwepx exceptions originate from host state (MSR[GS] = 0) which implies
  additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by 
  looking
  at the Exception Syndrome Register (ESR[EPID]) and the External PID Load 
  Context
  Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
  too intrusive for the host.
 
  Read guest last instruction from kvmppc_load_last_inst() by searching for 
  the
  physical address and kmap it. This address the TODO for TLB eviction and
  execute-but-not-read entries, and allow us to get rid of lwepx until we are
  able to handle failures.
 
  A simple stress benchmark shows a 1% sys performance degradation compared 
  with
  previous approach (lwepx without failure handling):
 
  time for i in `seq 1 1`; do /bin/echo  /dev/null; done
 
  real0m 8.85s
  user0m 4.34s
  sys 0m 4.48s
 
  vs
 
  real0m 8.84s
  user0m 4.36s
  sys 0m 4.44s
 
  An alternative solution, to handle lwepx exceptions in KVM, is to temporary
  highjack the interrupt vector from host. Some cores share host IVOR 
  registers
  between hardware threads, which is the case of FSL e6500, which impose 
  additional
  synchronization logic for this solution to work. This optimized solution can
  be developed later on top of this patch.
 
  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
  ---
  v3:
- reworked patch description
- use unaltered kmap addr for kunmap
- get last instruction before beeing preempted
 
  v2:
- reworked patch description
- used pr_* functions
- addressed cosmetic feedback
 
arch/powerpc/kvm/booke.c  | 32 
arch/powerpc/kvm/bookehv_interrupts.S | 37 --
arch/powerpc/kvm/e500_mmu_host.c  | 93 
  +++
3 files changed, 134 insertions(+), 28 deletions(-)
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index 34a42b9..4ef52a8 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -880,6 +880,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  int r = RESUME_HOST;
  int s;
  int idx;
  +   u32 last_inst = KVM_INST_FETCH_FAILED;
  +   enum emulation_result emulated = EMULATE_DONE;

  /* update before a new last_exit_type is rewritten */
  kvmppc_update_timing_stats(vcpu);
  @@ -887,6 +889,15 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  /* restart interrupts if they were meant for the host */
  kvmppc_restart_interrupt(vcpu, exit_nr);

  +   /*
  +* get last instruction before beeing preempted
  +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR  ESR_DATA
  +*/
  +   if (exit_nr == BOOKE_INTERRUPT_DATA_STORAGE ||
  +   exit_nr == BOOKE_INTERRUPT_DTLB_MISS ||
  +   exit_nr == BOOKE_INTERRUPT_HV_PRIV)
 
 Please make this a switch() - that's easier to read.
 
  +   emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
  +
  local_irq_enable();

  trace_kvm_exit(exit_nr, vcpu);
  @@ -895,6 +906,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
  kvm_vcpu *vcpu,
  run-exit_reason = KVM_EXIT_UNKNOWN;
  run-ready_for_interrupt_injection = 1;

  +   switch (emulated) {
  +   case EMULATE_AGAIN:
  +   r = RESUME_GUEST;
  +   goto out;
  +
  +   case EMULATE_FAIL:
  +   pr_debug(%s: emulation at %lx failed (%08x)\n,
  +  __func__, vcpu-arch.pc, last_inst);
  +   /* For debugging, encode the failing instruction and
  +* report it to userspace. */
  +   run-hw.hardware_exit_reason = ~0ULL  32;
  +   run-hw.hardware_exit_reason |= last_inst;
  +   kvmppc_core_queue_program(vcpu, ESR_PIL);
  +   r = RESUME_HOST;
  +   goto out;
  +
  +   default:
  +   break;
  +   }
 
 I think you can just put this into a function.
 
 Scott, I think the patch overall looks quite good. Can you please check 
 as well and if you agree give it your reviewed-by? Mike, when Scott 
 gives you a reviewed-by, please include it for the next version.
 
 
 Alex
 
  +
  switch (exit_nr) {
  case BOOKE_INTERRUPT_MACHINE_CHECK:
  printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR));
  @@ -1184,6 +1215,7 @@ int 

Re: [PATCH v2 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-05-02 Thread Scott Wood
On Fri, 2014-05-02 at 13:10 +0200, Alexander Graf wrote:
 On 05/02/2014 12:12 PM, David Laight wrote:
  You also probably want the page mapped uncached - no point polluting the 
  data
  cache.

We can't do that without creating an architecturally illegal alias
between cacheable and non-cacheable mappings.

 Do e500 chips have a shared I/D cache somewhere?

Yes.  Only L1 is separate.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-04-01 Thread Scott Wood
On Tue, 2014-04-01 at 07:47 +0200, Alexander Graf wrote:
 
  Am 01.04.2014 um 01:03 schrieb Scott Wood scottw...@freescale.com:
  
  On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote:
  On 03/26/2014 10:17 PM, Scott Wood wrote:
  On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
  +/*
  + * Another thread may rewrite the TLB entry in parallel, don't
  + * execute from the address if the execute permission is not set
  + */
  
  What happens when another thread rewrites the TLB entry in parallel? 
  Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? 
  Are the contents of the MAS registers consistent at this point or 
  inconsistent?
  
  If another thread rewrites the TLB entry, then we use the new TLB entry,
  just as if it had raced in hardware.  This check ensures that we don't
  execute from the new TLB entry if it doesn't have execute permissions
  (just as hardware would).
  
  What happens if the new TLB entry is valid and executable, and the
  instruction pointed to is valid, but doesn't trap (and thus we don't
  have emulation for it)?
  
  There has to be a good way to detect such a race and deal with it, no?
  
  How would you detect it?  We don't get any information from the trap
  about what physical address the instruction was fetched from, or what
  the instruction was.
 
 Ah, this is a guest race where the guest modifies exactly the TLB in 
 question. I see.
 
 Why would this ever happen in practice (in a case where we're not executing 
 malicious code)?

I don't know.  It probably wouldn't.  But it seems wrong to not check
the exec bit.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-03-31 Thread Scott Wood
On Mon, 2014-03-31 at 15:41 +0200, Alexander Graf wrote:
 On 03/26/2014 10:17 PM, Scott Wood wrote:
  On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
  +  /*
  +   * Another thread may rewrite the TLB entry in parallel, don't
  +   * execute from the address if the execute permission is not set
  +   */
 
 What happens when another thread rewrites the TLB entry in parallel? 
 Does tlbsx succeed? Does it fail? Do we see failure indicated somehow? 
 Are the contents of the MAS registers consistent at this point or 
 inconsistent?

If another thread rewrites the TLB entry, then we use the new TLB entry,
just as if it had raced in hardware.  This check ensures that we don't
execute from the new TLB entry if it doesn't have execute permissions
(just as hardware would).

What happens if the new TLB entry is valid and executable, and the
instruction pointed to is valid, but doesn't trap (and thus we don't
have emulation for it)?

 There has to be a good way to detect such a race and deal with it, no?

How would you detect it?  We don't get any information from the trap
about what physical address the instruction was fetched from, or what
the instruction was.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail

2014-03-26 Thread Scott Wood
On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
 diff --git a/arch/powerpc/kvm/book3s_paired_singles.c 
 b/arch/powerpc/kvm/book3s_paired_singles.c
 index a59a25a..80c533e 100644
 --- a/arch/powerpc/kvm/book3s_paired_singles.c
 +++ b/arch/powerpc/kvm/book3s_paired_singles.c
 @@ -640,19 +640,24 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool 
 rc,
  
  int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu)
  {
 - u32 inst = kvmppc_get_last_inst(vcpu);
 + u32 inst;
   enum emulation_result emulated = EMULATE_DONE;
 -
 - int ax_rd = inst_get_field(inst, 6, 10);
 - int ax_ra = inst_get_field(inst, 11, 15);
 - int ax_rb = inst_get_field(inst, 16, 20);
 - int ax_rc = inst_get_field(inst, 21, 25);
 - short full_d = inst_get_field(inst, 16, 31);
 -
 - u64 *fpr_d = vcpu-arch.fpr[ax_rd];
 - u64 *fpr_a = vcpu-arch.fpr[ax_ra];
 - u64 *fpr_b = vcpu-arch.fpr[ax_rb];
 - u64 *fpr_c = vcpu-arch.fpr[ax_rc];
 + int ax_rd, ax_ra, ax_rb, ax_rc;
 + short full_d;
 + u64 *fpr_d, *fpr_a, *fpr_b, *fpr_c;
 +
 + kvmppc_get_last_inst(vcpu, inst);

Should probably check for failure here and elsewhere -- even though it
can't currently fail on book3s, the interface now allows it.

 diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
 index 5b9e906..b0d884d 100644
 --- a/arch/powerpc/kvm/book3s_pr.c
 +++ b/arch/powerpc/kvm/book3s_pr.c
 @@ -624,9 +624,10 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
  static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
  {
   ulong srr0 = kvmppc_get_pc(vcpu);
 - u32 last_inst = kvmppc_get_last_inst(vcpu);
 + u32 last_inst;
   int ret;
  
 + kvmppc_get_last_inst(vcpu, last_inst);
   ret = kvmppc_ld(vcpu, srr0, sizeof(u32), last_inst, false);

This isn't new, but this function looks odd to me -- calling
kvmppc_get_last_inst() but ignoring last_inst, then calling kvmppc_ld()
and ignoring anything but failure.  last_inst itself is never read.  And
no comments to explain the weirdness. :-)

I get that kvmppc_get_last_inst() is probably being used for the side
effect of filling in vcpu-arch.last_inst, but why store the return
value without using it?  Why pass the address of it to kvmppc_ld(),
which seems to be used only as an indirect way of determining whether
kvmppc_get_last_inst() failed?  And that whole mechanism becomes
stranger once it becomes possible for kvmppc_get_last_inst() to directly
return failure.

 diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
 index 09bfd9b..c7c60c2 100644
 --- a/arch/powerpc/kvm/booke.h
 +++ b/arch/powerpc/kvm/booke.h
 @@ -90,6 +90,9 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
  void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
  void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
  
 +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
 +   ulong esr_flags);

Whitespace

 +
  enum int_class {
   INT_CLASS_NONCRIT,
   INT_CLASS_CRIT,
 @@ -123,6 +126,8 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu 
 *vcpu, int sprn,
  extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn,
 ulong *spr_val);
  
 +extern int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) ;

Whitespace

 diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
 b/arch/powerpc/kvm/e500_mmu_host.c
 index ecf2247..6025cb7 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -34,6 +34,7 @@
  #include e500.h
  #include timing.h
  #include e500_mmu_host.h
 +#include booke.h
  
  #include trace_booke.h
  
 @@ -597,6 +598,10 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
 gpa_t gpaddr,
   }
  }
  
 +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) {
 + return EMULATE_FAIL;
 +};

Brace placement

  /* MMU Notifiers */
  
  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
 index 2f9a087..24a8e50 100644
 --- a/arch/powerpc/kvm/emulate.c
 +++ b/arch/powerpc/kvm/emulate.c
 @@ -225,19 +225,26 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, 
 int sprn, int rt)
   * from opcode tables in the future. */
  int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
  {
 - u32 inst = kvmppc_get_last_inst(vcpu);
 - int ra = get_ra(inst);
 - int rs = get_rs(inst);
 - int rt = get_rt(inst);
 - int sprn = get_sprn(inst);
 - enum emulation_result emulated = EMULATE_DONE;
 + u32 inst;
 + int ra, rs, rt, sprn;
 + enum emulation_result emulated;
   int advance = 1;
  
   /* this default type might be overwritten by subcategories */
   kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
  
 + emulated = kvmppc_get_last_inst(vcpu, inst);
 + if (emulated != EMULATE_DONE) {
 + 

Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-03-26 Thread Scott Wood
On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote:
 Load external pid (lwepx) instruction faults (when called from
 KVM with guest context) needs to be handled by KVM. This implies
 additional code in DO_KVM macro to identify the source of the
 exception (which oiginate from KVM host rather than the guest).
 The hook requires to check the Exception Syndrome Register
 ESR[EPID] and External PID Load Context Register EPLC[EGS] for
 some exceptions (DTLB_MISS, DSI and LRAT). Doing this on Data TLB
 miss exception is obvious intrusive for the host.
 
 Get rid of lwepx and acquire last instuction in kvmppc_get_last_inst()
 by searching for the physical address and kmap it. This fixes an
 infinite loop caused by lwepx's data TLB miss handled in the host
 and the TODO for TLB eviction and execute-but-not-read entries.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kvm/bookehv_interrupts.S |   37 +++--
  arch/powerpc/kvm/e500_mmu_host.c  |   93 
 +
  2 files changed, 102 insertions(+), 28 deletions(-)
 
 diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
 b/arch/powerpc/kvm/bookehv_interrupts.S
 index 20c7a54..c50490c 100644
 --- a/arch/powerpc/kvm/bookehv_interrupts.S
 +++ b/arch/powerpc/kvm/bookehv_interrupts.S
 @@ -119,38 +119,14 @@
  1:
  
   .if \flags  NEED_EMU
 - /*
 -  * This assumes you have external PID support.
 -  * To support a bookehv CPU without external PID, you'll
 -  * need to look up the TLB entry and create a temporary mapping.
 -  *
 -  * FIXME: we don't currently handle if the lwepx faults.  PR-mode
 -  * booke doesn't handle it either.  Since Linux doesn't use
 -  * broadcast tlbivax anymore, the only way this should happen is
 -  * if the guest maps its memory execute-but-not-read, or if we
 -  * somehow take a TLB miss in the middle of this entry code and
 -  * evict the relevant entry.  On e500mc, all kernel lowmem is
 -  * bolted into TLB1 large page mappings, and we don't use
 -  * broadcast invalidates, so we should not take a TLB miss here.
 -  *
 -  * Later we'll need to deal with faults here.  Disallowing guest
 -  * mappings that are execute-but-not-read could be an option on
 -  * e500mc, but not on chips with an LRAT if it is used.
 -  */
 -
 - mfspr   r3, SPRN_EPLC   /* will already have correct ELPID and EGS */
   PPC_STL r15, VCPU_GPR(R15)(r4)
   PPC_STL r16, VCPU_GPR(R16)(r4)
   PPC_STL r17, VCPU_GPR(R17)(r4)
   PPC_STL r18, VCPU_GPR(R18)(r4)
   PPC_STL r19, VCPU_GPR(R19)(r4)
 - mr  r8, r3
   PPC_STL r20, VCPU_GPR(R20)(r4)
 - rlwimi  r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
   PPC_STL r21, VCPU_GPR(R21)(r4)
 - rlwimi  r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
   PPC_STL r22, VCPU_GPR(R22)(r4)
 - rlwimi  r8, r10, EPC_EPID_SHIFT, EPC_EPID
   PPC_STL r23, VCPU_GPR(R23)(r4)
   PPC_STL r24, VCPU_GPR(R24)(r4)
   PPC_STL r25, VCPU_GPR(R25)(r4)
 @@ -160,10 +136,15 @@
   PPC_STL r29, VCPU_GPR(R29)(r4)
   PPC_STL r30, VCPU_GPR(R30)(r4)
   PPC_STL r31, VCPU_GPR(R31)(r4)
 - mtspr   SPRN_EPLC, r8
 - isync
 - lwepx   r9, 0, r5
 - mtspr   SPRN_EPLC, r3
 +
 + /*
 +  * We don't use external PID support. lwepx faults would need to be
 +  * handled by KVM and this implies aditional code in DO_KVM (for
 +  * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
 +  * is too intrusive for the host. Get last instuction in
 +  * kvmppc_get_last_inst().
 +  */
 + li  r9, KVM_INST_FETCH_FAILED
   stw r9, VCPU_LAST_INST(r4)
   .endif
  
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 6025cb7..1b4cb41 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -598,9 +598,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
 gpa_t gpaddr,
   }
  }
  
 +#ifdef CONFIG_KVM_BOOKE_HV
 +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr)

It'd be interesting to see what the performance impact of doing this on
non-HV would be -- it would eliminate divergent code, eliminate the
MSR_DS hack, and make exec-only mappings work.

 +{
 + gva_t geaddr;
 + hpa_t addr;
 + hfn_t pfn;
 + hva_t eaddr;
 + u32 mas0, mas1, mas2, mas3;
 + u64 mas7_mas3;
 + struct page *page;
 + unsigned int addr_space, psize_shift;
 + bool pr;
 + unsigned long flags;
 +
 + /* Search TLB for guest pc to get the real address */
 + geaddr = kvmppc_get_pc(vcpu);
 + addr_space = (vcpu-arch.shared-msr  MSR_IS)  MSR_IR_LG;
 +
 + local_irq_save(flags);
 + mtspr(SPRN_MAS6, (vcpu-arch.pid  MAS6_SPID_SHIFT) | addr_space);
 + mtspr(SPRN_MAS5, MAS5_SGS | vcpu-kvm-arch.lpid);
 + isync();
 + asm volatile(tlbsx 0, %[geaddr]\n : : [geaddr] r (geaddr));

We can probably get away without 

Re: [PATCH 0/8] PPC Book 3S HV-mode KVM updates for 3.15

2014-03-24 Thread Scott Wood
On Tue, 2014-03-25 at 10:47 +1100, Paul Mackerras wrote:
 This series of patches fixes some bugs in HV-mode KVM for PowerPC Book
 3S and finishes off adding the support for POWER8.  Patches 2 and 3
 are the two patches from the series I posted in January that Alex Graf
 didn't apply at that stage.  I have updated them according to his
 review comments.  The last patch is also POWER8-related, adding code
 to save and restore more of the host state of the PMU.  (We
 context-switch the PMU between host and guest since the guest can
 access the PMU directly.)  The remaining patches fix bugs that have
 been found over the last few months of testing.
 
 This patch series is based on the merge of the queue branch of the
 kvm tree with the kvm-ppc-queue branch of Alex Graf's tree, though I
 expect they would apply cleanly against the kvm tree queue branch
 also.
 
 I would like these to go into 3.15.  Scott, please ack.
 
 Paul.
 
 ---
 [PATCH 1/8] KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
 [PATCH 2/8] KVM: PPC: Book3S HV: Add transactional memory support
 [PATCH 3/8] KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
 [PATCH 4/8] KVM: PPC: Book3S: Trim top 4 bits of physical address in
 [PATCH 5/8] KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
 [PATCH 6/8] KVM: PPC: Book3S HV: Don't use kvm_memslots() in real
 [PATCH 7/8] KVM: PPC: Book3S HV: Fix decrementer timeouts with
 [PATCH 8/8] KVM: PPC: Book3S HV: Save/restore host PMU registers that

Acked-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Fixes for HV KVM on PPC for 3.14

2014-03-13 Thread Scott Wood
On Thu, 2014-03-13 at 20:01 +1100, Paul Mackerras wrote:
 These two patches fix two things in hypervisor-mode KVM for the IBM
 POWER server processors.  The first patch removes a hunk of extraneous
 code that got in as a result of a mistake I made in cleaning up after
 rebasing a patch.  The second fixes a bug that causes host memory
 corruption.  Both patches fix things that cause host crashes, so I'd
 like them in 3.14 if possible.  The two patches only touch one file,
 arch/powerpc/kvm/book3s_hv_rmhandlers.S, so they can't possibly cause
 any problems for other architectures or other PPC platforms.
 
 The patches are against the master branch of the kvm tree but should
 apply equally on Linus' current master branch.
 
 Scott, please ack.
 
 Paolo, I meant to get these out earlier, but a personal emergency
 arose this week and delayed me.

Acked-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] powerpc/booke64: Use SPRG7 for VDSO

2014-03-13 Thread Scott Wood
Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).

In commit 8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad (powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss  crit int), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:

 - It's incompatible with EXCEPTION_COMMON which restores r13 from the
   by-then-overwritten scratch (this cost me some debugging time).
 - It forces critical exceptions to be a special case handled
   differently from even machine check and debug level exceptions.
 - It didn't occur to me that it was possible to make this work at all
   (by doing a final ld r13, PACA_EXCRIT+EX_R13(r13)) until after
   I made (most of) this patch. :-)

It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles.  Until then, let's stick with
SPRG for all exceptions.

Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.

Signed-off-by: Scott Wood scottw...@freescale.com
Cc: Mihai Caraman mihai.cara...@freescale.com
Cc: Anton Blanchard an...@samba.org
Cc: Paul Mackerras pau...@samba.org
Cc: kvm-ppc@vger.kernel.org
---
 arch/powerpc/include/asm/exception-64e.h|  5 ++---
 arch/powerpc/include/asm/kvm_booke_hv_asm.h |  9 +
 arch/powerpc/include/asm/paca.h |  2 +-
 arch/powerpc/include/asm/reg.h  | 13 ++---
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kernel/exceptions-64e.S| 19 ++-
 arch/powerpc/kernel/vdso.c  |  8 
 arch/powerpc/kernel/vdso32/getcpu.S |  2 +-
 arch/powerpc/kernel/vdso64/getcpu.S |  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 ++--
 arch/powerpc/kvm/book3s_interrupts.S|  4 ++--
 arch/powerpc/kvm/bookehv_interrupts.S   | 10 ++
 12 files changed, 33 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 51fa43e..e73452f 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -46,9 +46,8 @@
 #define EX_CR  (1 * 8)
 #define EX_R10 (2 * 8)
 #define EX_R11 (3 * 8)
-#define EX_R13 (4 * 8)
-#define EX_R14 (5 * 8)
-#define EX_R15 (6 * 8)
+#define EX_R14 (4 * 8)
+#define EX_R15 (5 * 8)
 
 /*
  * The TLB miss exception uses different slots.
diff --git a/arch/powerpc/include/asm/kvm_booke_hv_asm.h 
b/arch/powerpc/include/asm/kvm_booke_hv_asm.h
index 3a79f53..c3e3fd5 100644
--- a/arch/powerpc/include/asm/kvm_booke_hv_asm.h
+++ b/arch/powerpc/include/asm/kvm_booke_hv_asm.h
@@ -36,20 +36,13 @@
  *   *(r8 + GPR11) = saved r11
  *
  * 64-bit host
- * Expected inputs (GEN/GDBELL/DBG/MC exception types):
+ * Expected inputs (GEN/GDBELL/DBG/CRIT/MC exception types):
  *  r10 = saved CR
  *  r13 = PACA_POINTER
  *  *(r13 + PACA_EX##type + EX_R10) = saved r10
  *  *(r13 + PACA_EX##type + EX_R11) = saved r11
  *  SPRN_SPRG_##type##_SCRATCH = saved r13
  *
-  * Expected inputs (CRIT exception type):
- *  r10 = saved CR
- *  r13 = PACA_POINTER
- *  *(r13 + PACA_EX##type + EX_R10) = saved r10
- *  *(r13 + PACA_EX##type + EX_R11) = saved r11
- *  *(r13 + PACA_EX##type + EX_R13) = saved r13
- *
  * Expected inputs (TLB exception type):
  *  r10 = saved CR
  *  r13 = PACA_POINTER
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9c5dbc3..948f01a 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -146,7 +146,7 @@ struct paca_struct {
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
-   u64 sprg3;  /* Saved user-visible sprg */
+   u64 sprg_vdso;  /* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
u64 tm_scratch; /* TM scratch area for reclaim */
 #endif
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 90c06ec..b9ac329 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -577,9 +577,13 @@
 #define SPRN_SPRG3 0x113   /* Special Purpose Register General 3 */
 #define SPRN_USPRG30x103   /* SPRG3 userspace read */
 #define SPRN_SPRG4 0x114   /* Special Purpose Register General 4 */
+#define SPRN_USPRG40x104   /* SPRG4 userspace read */
 #define SPRN_SPRG5 0x115   /* Special Purpose Register General 5

[PATCH 06/10] powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlers

2014-03-13 Thread Scott Wood
While bolted handlers (including e6500) do not need to deal with a TLB
miss recursively causing another TLB miss, nested TLB misses can still
happen with crit/mc/debug exceptions -- so we still need to honor
SPRG_TLB_EXFRAME.

We don't need to spend time modifying it in the TLB miss fastpath,
though -- the special level exception will handle that.

Signed-off-by: Scott Wood scottw...@freescale.com
Cc: Mihai Caraman mihai.cara...@freescale.com
Cc: kvm-ppc@vger.kernel.org
---
 arch/powerpc/include/asm/exception-64e.h| 10 ---
 arch/powerpc/include/asm/kvm_booke_hv_asm.h |  8 --
 arch/powerpc/kvm/bookehv_interrupts.S   | 14 +++--
 arch/powerpc/mm/tlb_low_64e.S   | 44 ++---
 4 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index e73452f..a563d9af 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -172,16 +172,6 @@ exc_##label##_book3e:
ld  r9,EX_TLB_R9(r12);  \
ld  r8,EX_TLB_R8(r12);  \
mtlrr16;
-#define TLB_MISS_PROLOG_STATS_BOLTED   
\
-   mflrr10;\
-   std r8,PACA_EXTLB+EX_TLB_R8(r13);   \
-   std r9,PACA_EXTLB+EX_TLB_R9(r13);   \
-   std r10,PACA_EXTLB+EX_TLB_LR(r13);
-#define TLB_MISS_RESTORE_STATS_BOLTED  
\
-   ld  r16,PACA_EXTLB+EX_TLB_LR(r13);  \
-   ld  r9,PACA_EXTLB+EX_TLB_R9(r13);   \
-   ld  r8,PACA_EXTLB+EX_TLB_R8(r13);   \
-   mtlrr16;
 #define TLB_MISS_STATS_D(name) \
addir9,r13,MMSTAT_DSTATS+name;  \
bl  .tlb_stat_inc;
diff --git a/arch/powerpc/include/asm/kvm_booke_hv_asm.h 
b/arch/powerpc/include/asm/kvm_booke_hv_asm.h
index c3e3fd5..e5f048b 100644
--- a/arch/powerpc/include/asm/kvm_booke_hv_asm.h
+++ b/arch/powerpc/include/asm/kvm_booke_hv_asm.h
@@ -45,10 +45,12 @@
  *
  * Expected inputs (TLB exception type):
  *  r10 = saved CR
+ *  r12 = extlb pointer
  *  r13 = PACA_POINTER
- *  *(r13 + PACA_EX##type + EX_TLB_R10) = saved r10
- *  *(r13 + PACA_EX##type + EX_TLB_R11) = saved r11
- *  SPRN_SPRG_GEN_SCRATCH = saved r13
+ *  *(r12 + EX_TLB_R10) = saved r10
+ *  *(r12 + EX_TLB_R11) = saved r11
+ *  *(r12 + EX_TLB_R13) = saved r13
+ *  SPRN_SPRG_GEN_SCRATCH = saved r12
  *
  * Only the bolted version of TLB miss exception handlers is supported now.
  */
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 99635a3..890e338 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -229,13 +229,20 @@
stw r10, VCPU_CR(r4)
PPC_STL r11, VCPU_GPR(R4)(r4)
PPC_STL r5, VCPU_GPR(R5)(r4)
-   mfspr   r5, \scratch
PPC_STL r6, VCPU_GPR(R6)(r4)
PPC_STL r8, VCPU_GPR(R8)(r4)
PPC_STL r9, VCPU_GPR(R9)(r4)
-   PPC_STL r5, VCPU_GPR(R13)(r4)
+   .if \type == EX_TLB
+   PPC_LL  r5, EX_TLB_R13(r12)
+   PPC_LL  r6, EX_TLB_R10(r12)
+   PPC_LL  r8, EX_TLB_R11(r12)
+   mfspr   r12, \scratch
+   .else
+   mfspr   r5, \scratch
PPC_LL  r6, (\paca_ex + \ex_r10)(r13)
PPC_LL  r8, (\paca_ex + \ex_r11)(r13)
+   .endif
+   PPC_STL r5, VCPU_GPR(R13)(r4)
PPC_STL r3, VCPU_GPR(R3)(r4)
PPC_STL r7, VCPU_GPR(R7)(r4)
PPC_STL r12, VCPU_GPR(R12)(r4)
@@ -444,6 +451,9 @@ _GLOBAL(kvmppc_resume_host)
PPC_STD(r8, VCPU_SHARED_SPRG6, r11)
mfxer   r3
PPC_STD(r9, VCPU_SHARED_SPRG7, r11)
+#ifdef CONFIG_64BIT
+   mtspr   SPRN_SPRG_VDSO_WRITE, r3
+#endif
 
/* save guest MAS registers and restore host mas4  mas6 */
mfspr   r5, SPRN_MAS0
diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index 1e50249..356e8b4 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -39,37 +39,49 @@
  **
  **/
 
+/*
+ * Note that, unlike non-bolted handlers, TLB_EXFRAME is not
+ * modified by the TLB miss handlers themselves, since the TLB miss
+ * handler code will not itself cause a recursive TLB miss.
+ *
+ * TLB_EXFRAME will be modified when crit/mc/debug exceptions are
+ * entered/exited.
+ */
 .macro tlb_prolog_bolted intnum addr
-   mtspr   SPRN_SPRG_GEN_SCRATCH,r13
+   mtspr   SPRN_SPRG_GEN_SCRATCH,r12
+   mfspr   r12,SPRN_SPRG_TLB_EXFRAME

Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO

2014-01-24 Thread Scott Wood
On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
 The KVM API documentation is not clear about the semantics of the data
 field on the mmio struct on the kvm_run struct.
 
 This has become problematic when supporting ARM guests on big-endian
 host systems with guests of both endianness types, because it is unclear
 how the data should be exported to user space.
 
 This should not break with existing implementations as all supported
 existing implementations of known user space applications (QEMU and
 kvmtools for virtio) only support default endianness of the
 architectures on the host side.
 
 Cc: Marc Zyngier marc.zyng...@arm.com
 Cc: Peter Maydell peter.mayd...@linaro.org
 Cc: Alexander Graf ag...@suse.de
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 Changes [v1 - v2]:
  - s/host kernel should/host user space should/
 
  Documentation/virtual/kvm/api.txt | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 366bf4b..6dbd68c 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could 
 not be satisfied
  by kvm.  The 'data' member contains the written data if 'is_write' is
  true, and should be filled by application code otherwise.
  
 +The 'data' member byte order is host kernel native endianness, regardless of
 +the endianness of the guest, and represents the the value as it would go on 
 the
 +bus in real hardware.  The host user space should always be able to do:
 +type val = *((type *)mmio.data).

Host userspace should be able to do that with what results?  It would
only produce a directly usable value if host endianness is the same as
the emulated device's endianness.

I'm not sure that host kernel native endianness is an accurate way of
describing what currently happens.  Regardless of host or guest
endianness, the guest should be swapping the value as necessary to
ensure that the value that goes on the (real or emulated) bus is the
same.

Plus, if it were really as it would go on the bus the value wouldn't
necessarily be left justified within data[], depending on how the bus 
works.

How about a wording like this:

  The 'data' member contains, in its first 'len' bytes, the value as it
  would appear if the guest had accessed memory rather than I/O.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO

2014-01-24 Thread Scott Wood
On Sat, 2014-01-25 at 00:24 +, Peter Maydell wrote:
 On 24 January 2014 23:51, Scott Wood scottw...@freescale.com wrote:
  On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index 366bf4b..6dbd68c 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which 
  could not be satisfied
   by kvm.  The 'data' member contains the written data if 'is_write' is
   true, and should be filled by application code otherwise.
 
  +The 'data' member byte order is host kernel native endianness, regardless 
  of
  +the endianness of the guest, and represents the the value as it would go 
  on the
  +bus in real hardware.  The host user space should always be able to do:
  +type val = *((type *)mmio.data).
 
  Host userspace should be able to do that with what results?  It would
  only produce a directly usable value if host endianness is the same as
  the emulated device's endianness.
 
 With the result that it gets the value the CPU has sent out on
 the bus as the memory transaction.

Doesn't that assume the host kernel endianness is the same as the bus
(or rather, that the host CPU would not swap such an access before it
hits the bus)?

If you take the same hardware and boot a little endian host kernel one
day, and a big endian host kernel the next, the bus doesn't change, and
neither should the bytewise (assuming address invariance) contents of
data[].  How data[] would look when read as a larger integer would of
course change -- but that's due to how you're reading it.

It's clear to say that a value in memory has been stored there in host
endianness when the value is as you would want to see it in a CPU
register, but it's less clear when you talk about it relative to values
on a bus.  It's harder to correlate that to something that is software
visible.

I don't think there's any actual technical difference between your
wording and mine when each wording is properly interpreted, but I
suspect my wording is less likely to be misinterpreted (I could be
wrong).

 Obviously if what userspace
 is emulating is a bus which has a byteswapping bridge or if it's
 being helpful to device emulation by providing here's the value
 even though you think you're wired up backwards then it needs
 to byteswap.

Whether the emulated bus has a byteswapping bridge doesn't sound like
something that depends on the endianness that the host CPU is currently
running in.

  How about a wording like this:
 
The 'data' member contains, in its first 'len' bytes, the value as it
would appear if the guest had accessed memory rather than I/O.
 
 I think this is confusing, because now userspace authors have
 to figure out how to get back to value X of size Y at address Z
 by interpreting this text... Can you write out the equivalent of
 Christoffer's text here's how you get the memory transaction
 value for what you want?

Userspace swaps the value if and only if userspace's endianness differs
from the endianness with which the device interprets the data
(regardless of whether said interpretation is considered natural or
swapped relative to the way the bus is documented).  It's similar to how
userspace would handle emulating DMA.

KVM swaps the value if and only if the endianness of the guest access
differs from that of the host, i.e. if it would have done swapping when
emulating an ordinary memory access.

 (Also, value as it would appear to who?)

As it would appear to anyone.  It works because data[] actually is
memory.  Any difference in how data appears based on the reader's
context would already be reflected when the reader performs the load.

 I think your wording implies that the order of bytes in data[] depend
 on the guest CPU usual byte order, ie the order which the CPU
 does not do a byte-lane-swap for (LE for ARM, BE for PPC),
 and it would mean it would come out differently from
 my/Alex/Christoffer's proposal if the host kernel was the opposite
 endianness from that usual order.

It doesn't depend on usual anything.  The only thing it implicitly
says about guest byte order is that it's KVM's job to implement any
swapping if the endianness of the guest access is different from the
endianness of the host kernel access (whether it's due to the guest's
mode, the way a page is mapped, the instruction used, etc).

 Finally, I think it's a bit confusing in that as if the guest had
 accessed memory is assigning implicit semantics to memory
 in the emulated system, when memory is actually kind of outside
 KVM's purview because it's not part of the CPU.

That's sort of the point.  It defines it in a way that is independent of
the CPU, and thus independent of what endianness the CPU operates in.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More

[PATCH] kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub

2014-01-09 Thread Scott Wood
Commit 7940876e1330671708186ac3386aa521ffb5c182 (kvm: make local
functions static) broke KVM PPC builds due to removing (rather than
moving) the stub version of kvm_vcpu_eligible_for_directed_yield().

This patch reintroduces it.

Signed-off-by: Scott Wood scottw...@freescale.com
Cc: Stephen Hemminger step...@networkplumber.org
Cc: Alexander Graf ag...@suse.de
---
 virt/kvm/kvm_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4733fa1..f896665 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1771,6 +1771,11 @@ static bool kvm_vcpu_eligible_for_directed_yield(struct 
kvm_vcpu *vcpu)
 
return eligible;
 }
+#else
+static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
+{
+   return true;
+}
 #endif
 
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2014-01-02 Thread Scott Wood
On Sun, 2013-12-29 at 16:43 +0100, Alexander Graf wrote:
 On 11.07.2013, at 00:47, Scott Wood scottw...@freescale.com wrote:
  diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
  index ddfaf56..c13caa0 100644
  --- a/arch/powerpc/kvm/book3s_pr.c
  +++ b/arch/powerpc/kvm/book3s_pr.c
  @@ -884,14 +884,11 @@ program_interrupt:
   * and if we really did time things so badly, then we just exit
   * again due to a host external interrupt.
   */
  -   local_irq_disable();
  s = kvmppc_prepare_to_enter(vcpu);
  -   if (s = 0) {
  -   local_irq_enable();
  +   if (s = 0)
  r = s;
  -   } else {
  +   else
  kvmppc_fix_ee_before_entry();
  -   }
  }
 
 Please add a comment here stating that interrupts are hard disabled at this 
 point.

OK.

  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 4e05f8c..2f7a221 100644
  --- a/arch/powerpc/kvm/powerpc.c
  +++ b/arch/powerpc/kvm/powerpc.c
  @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
  {
  int r = 1;
  
  -   WARN_ON_ONCE(!irqs_disabled());
  +   WARN_ON(irqs_disabled());
  +   hard_irq_disable();
  +
  while (true) {
  if (need_resched()) {
  local_irq_enable();
 
 One thing I find reasonably tricky in this approach is that we run
 local_irq_enable, but hard_irq_disable. I did dig through the code and
 it should work just fine, but I think we should make sure to note that
 this is intended and doesn't just work by accident.
 
 Just add a comment above the first call to hard_irq_disable() that
 explains that local_irq_enable() will properly unroll hard_irq_disable.
 That way the next person reading this doesn't have to dig as deeply.

There is no hard_irq_enable() -- only __hard_irq_enable() that doesn't
update local_paca-irq_happened.

This is normal use of the API.  If there does need to be a comment, it
should go in hw_irq.h. :-)

  #ifdef CONFIG_PPC64
  -   /* lazy EE magic */
  -   hard_irq_disable();
  -   if (lazy_irq_pending()) {
  -   /* Got an interrupt in between, try again */
  -   local_irq_enable();
  -   local_irq_disable();
  -   kvm_guest_exit();
  -   continue;
  -   }
  +   WARN_ON(lazy_irq_pending());
  #endif
  +   /* Can't use irqs_disabled() because we want hard irq state */
  +   WARN_ON(mfmsr()  MSR_EE);
 
 The reason for lazy EE is that mfmsr() is supposed to be slow.

Not just mtmsr?

And when I complained about wrtee not being all that slow on our
hardware, Ben said it was also for perf coverage. :-)

 Couldn't we check for the internal hard disable flag instead? Just create a 
 new
 helper in hw_irq.h that tells us
 
   local_paca-irq_happened  PACA_IRQ_HARD_DIS
 
 on PPC64 and
 
   !(mfmsr()  MSR_EE)
 
 on PPC32. We can then just use that helper here and not run expensive 
 mfmsr() calls.

 I'm not sure whether it's actually measurable overhead in the context
 of the whole world switch, but why diverge from the rest of the system?
 If you think a new helper is overkill, I'd be fine with a simple #ifdef
 here just as well.

I'd hope a mfmsr wouldn't be so slow on any hardware as to be a big deal
here, though it'd also be nice if there were a Linux-wide way of
specifying whether a particular WARN should be always checked, or only
if the kernel is built with some debug option...

The rest of the system is irqs_disabled() and there's already a
comment explaining why we diverge from that.

Maybe we should just remove that check; we'd still at least have the
64-bit check in kvmppc_fix_ee_before_entry.

  kvm_guest_enter();
  -   break;
  +   return r;
 
 This must be 0 at this point, right?

No, it'll be 1.

 Either way, I prefer to keep the code easily understandable. How about
 you keep the break and just add an if (r) around the local_irq_enable()
 below? Then add a comment stating that the return value tells us
 whether entry is ok to proceed and interrupts are disabled (0) or
 something didn't work out and interrupts are enabled (!0).

How is it more understandable to overload what looks like a single code
path with divergent cases that have little to nothing in common?  Would
it help to add a comment on the return-to-host code path indicating that
it is only used for returning to the host?

I'd be fine with replacing return r with return 1 (and getting rid
of the initialization of r at the top of the function, unless GCC
complains inappropriately).

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm/booke: Fix build break due to stack frame size warning

2013-12-09 Thread Scott Wood
On Mon, 2013-11-25 at 04:26 -0600, Bharat Bhushan wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Saturday, November 23, 2013 3:22 AM
  To: Alexander Graf
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421; 
  Bhushan
  Bharat-R65777
  Subject: [PATCH] powerpc/kvm/booke: Fix build break due to stack frame size
  warning
 
  Commit ce11e48b7fdd256ec68b932a89b397a790566031 (KVM: PPC: E500: Add
  userspace debug stub support) added struct thread_struct to the
  stack of kvmppc_vcpu_run().  thread_struct is 1152 bytes on my build,
  compared to 48 bytes for the recently-introduced struct debug_reg.
  Use the latter instead.
 
  This fixes the following error:
 
  cc1: warnings being treated as errors
  arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run':
  arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is 
  larger
  than 1024 bytes
  make[2]: *** [arch/powerpc/kvm/booke.o] Error 1
  make[1]: *** [arch/powerpc/kvm] Error 2
  make[1]: *** Waiting for unfinished jobs
 
  Signed-off-by: Scott Wood scottw...@freescale.com
  Cc: Bharat Bhushan r65...@freescale.com
  ---
  Build tested only.  Bharat, please test.
 
 Tested with qemu debug stub; It works fine
 
 -Bharat

Alex, are you going to take this through your tree?

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

2013-11-19 Thread Scott Wood
On Tue, 2013-11-19 at 10:51 +0100, Alexander Graf wrote:
 
 Am 19.11.2013 um 00:49 schrieb Scott Wood scottw...@freescale.com:
 
  On Mon, 2013-11-18 at 16:09 -0500, Alexander Graf wrote:
  On 18.11.2013, at 03:34, “tiejun.chen” tiejun.c...@windriver.com wrote:
  
  On 10/23/2013 09:26 AM, Tiejun Chen wrote:
  We enter with interrupts disabled in hardware, but we need to
  call RECONCILE_IRQ_STATE anyway to ensure that the software state
  is kept in sync instead of calling hard_irq_disable() directly.
  
  Why didn't this happen before? What is this patch fixing?
  
  It's cleanup, not a fix.  It makes things more consistent with other
  64-bit kernel entry code.
 
 Could we please note this in the comit message so that whoever stumbles over 
 the patch later knows that this is effectively a no-op (and just prepones the 
 lazy sync)?
 
 Also, I'm still wary of lazy breakage in the pr code path, but I guess since 
 it's completely untested today already it's ok to ignore.

PR doesn't support 64-bit at all (e.g. it's all stw/lwz rather than some
word-size abstraction).  In the unlikely event that lazy EE is ever
extended to 32-bit, then fixing up booke_interrupts.S would be just one
part of the task of updating all the 32-bit asm (e.g. you'll find
RECONCILE_IRQ_STATE in entry_64.S but not entry_32.S).

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 v3] kvm: powerpc: use cache attributes from linux pte

2013-11-18 Thread Scott Wood
On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote:
 From: Bharat Bhushan bharat.bhus...@freescale.com
 
 v2-v3
   - Returning pte pointer rather than pte as suggested by Scott
   - Also removed PAGE_SPLITTING as this need to be handled by caller 
 
 v1-v2
  - Removed _PAGE_BUSY loop as suggested by PaulS.
  - Added check for PAGE_SPLITTING
 
 kvm: powerpc: use cache attributes from linux pte
  - 1st Patch fixes a bug in booke (detail in patch)
  - 2nd patch is renaming the linux_pte_lookup_function() just for clarity.
There is not functional change.
  - 3nd Patch adds a Linux pte lookup function.
  - 4th Patch uses the above defined function and setup TLB.wimg accordingly
 
 Bharat Bhushan (4):
   kvm: booke: clear host tlb reference flag on guest tlb invalidation
   kvm: book3s: rename lookup_linux_pte() to
 lookup_linux_pte_and_update()
   kvm: powerpc: define a linux pte lookup function
   kvm: powerpc: use caching attributes as per linux pte
 
  arch/powerpc/include/asm/kvm_host.h |2 +-
  arch/powerpc/include/asm/pgtable.h  |   21 +
  arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++--
  arch/powerpc/kvm/booke.c|1 +
  arch/powerpc/kvm/e500.h |8 +++--
  arch/powerpc/kvm/e500_mmu_host.c|   55 +++---
  6 files changed, 64 insertions(+), 31 deletions(-)

Reviewed-by: Scott Wood scottw...@freescale.com with patch 4/4
replaced by v4.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

2013-11-18 Thread Scott Wood
On Mon, 2013-11-18 at 16:09 -0500, Alexander Graf wrote:
 On 18.11.2013, at 03:34, “tiejun.chen” tiejun.c...@windriver.com wrote:
 
  On 10/23/2013 09:26 AM, Tiejun Chen wrote:
  We enter with interrupts disabled in hardware, but we need to
  call RECONCILE_IRQ_STATE anyway to ensure that the software state
  is kept in sync instead of calling hard_irq_disable() directly.
 
 Why didn't this happen before? What is this patch fixing?

It's cleanup, not a fix.  It makes things more consistent with other
64-bit kernel entry code.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 v3] kvm: powerpc: use caching attributes as per linux pte

2013-11-15 Thread Scott Wood
On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote:
 @@ -440,9 +437,9 @@ static inline int kvmppc_e500_shadow_map(struct 
 kvmppc_vcpu_e500 *vcpu_e500,
   }
  
   if (likely(!pfnmap)) {
 - unsigned long tsize_pages = 1  (tsize + 10 - PAGE_SHIFT);
 + tsize_pages = 1  (tsize + 10 - PAGE_SHIFT);
   pfn = gfn_to_pfn_memslot(slot, gfn);
 - if (is_error_noslot_pfn(pfn)) {
 + if (is_error_noslot_pfn(pfn)  printk_ratelimit()) {
   printk(KERN_ERR Couldn't get real page for gfn %lx!\n,
   (long)gfn);
   return -EINVAL;

The return -EINVAL shouldn't be affected by printk_ratelimit().

 @@ -453,7 +450,17 @@ static inline int kvmppc_e500_shadow_map(struct 
 kvmppc_vcpu_e500 *vcpu_e500,
   gvaddr = ~((tsize_pages  PAGE_SHIFT) - 1);
   }
  
 - kvmppc_e500_ref_setup(ref, gtlbe, pfn);
 +
 + pgdir = vcpu_e500-vcpu.arch.pgdir;
 + ptep = lookup_linux_ptep(pgdir, hva, tsize_pages);
 + if (pte_present(*ptep))
 + wimg = (*ptep  PTE_WIMGE_SHIFT)  MAS2_WIMGE_MASK;
 + else if (printk_ratelimit()) {
 + printk(KERN_ERR %s: pte not present: gfn %lx, pfn %lx\n,
 + __func__, (long)gfn, pfn);
 + return -EINVAL;
 + }

Likewise.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 34/51] powerpc: move debug registers in a structure

2013-11-03 Thread Scott Wood
On Mon, 2013-11-04 at 07:56 +1100, Benjamin Herrenschmidt wrote:
 On Sun, 2013-11-03 at 16:30 +0200, Gleb Natapov wrote:
  On Thu, Oct 31, 2013 at 10:18:19PM +0100, Alexander Graf wrote:
   From: Bharat Bhushan r65...@freescale.com
   
   This way we can use same data type struct with KVM and
   also help in using other debug related function.
   
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   Signed-off-by: Alexander Graf ag...@suse.de
  It would be nice to have PPC maintainers (CCed) ACKs here. This patch
  also has merging conflicts with cbc9565e (should be easy to resolve)
 
 Is it easier if I put it in powerpc -next and resolve the conflicts ?
 
 I was ok with the patch but yes, I should probably have put it in a
 topic branch in the first place and merge it both in my tree and Alex.

It's already in your -next via my tree (sha1
51ae8d4a2b9e4aa9a502061b9c39168e08829b94).  I didn't realize Alex was
planning on applying it.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v4][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

2013-10-22 Thread Scott Wood
On Mon, 2013-10-21 at 17:57 +0800, Tiejun Chen wrote:
 +#ifdef CONFIG_64BIT
 + /*
 +  * We enter with interrupts disabled in hardware, but
 +  * we need to call SOFT_DISABLE_INTS anyway to ensure
 +  * that the software state is kept in sync.
 +  */
 + RECONCILE_IRQ_STATE(r3,r5)
 +#endif

s/SOFT_DISABLE_INTS/RECONCILE_IRQ_STATE/ in the comment.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] kvm: powerpc: define a linux pte lookup function

2013-10-09 Thread Scott Wood
On Wed, 2013-10-09 at 03:48 -0500, Bhushan Bharat-R65777 wrote:
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Wednesday, October 09, 2013 3:07 AM
  To: Bhushan Bharat-R65777
  Cc: ag...@suse.de; Yoder Stuart-B08248; k...@vger.kernel.org; kvm-
  p...@vger.kernel.org; pau...@samba.org; Bhushan Bharat-R65777
  Subject: Re: [PATCH 3/4] kvm: powerpc: define a linux pte lookup function
  
  On Tue, 2013-10-08 at 11:33 +0530, Bharat Bhushan wrote:
   We need to search linux pte to get pte attributes for setting TLB
   in KVM.
   This patch defines a linux_pte_lookup() function for same.
  
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   ---
arch/powerpc/include/asm/pgtable.h |   35 
   +++
1 files changed, 35 insertions(+), 0 deletions(-)
  
   diff --git a/arch/powerpc/include/asm/pgtable.h
   b/arch/powerpc/include/asm/pgtable.h
   index 7d6eacf..fd26c04 100644
   --- a/arch/powerpc/include/asm/pgtable.h
   +++ b/arch/powerpc/include/asm/pgtable.h
   @@ -223,6 +223,41 @@ extern int gup_hugepte(pte_t *ptep, unsigned long
   sz, unsigned long addr,  #endif  pte_t
   *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
  unsigned *shift);
   +
   +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
   +  unsigned long *pte_sizep)
   +{
   + pte_t *ptep;
   + pte_t pte;
   + unsigned long ps = *pte_sizep;
   + unsigned int shift;
   +
   + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
   + if (!ptep)
   + return __pte(0);
   + if (shift)
   + *pte_sizep = 1ul  shift;
   + else
   + *pte_sizep = PAGE_SIZE;
   +
   + if (ps  *pte_sizep)
   + return __pte(0);
   +
   + /* wait until _PAGE_BUSY is clear */
   + while (1) {
   + pte = pte_val(*ptep);
   + if (unlikely(pte  _PAGE_BUSY)) {
   + cpu_relax();
   + continue;
   + }
   + }
   +
   + /* If pte is not present return None */
   + if (unlikely(!(pte  _PAGE_PRESENT)))
   + return __pte(0);
   +
   + return pte;
   +}
  
  Can lookup_linux_pte_and_update() call lookup_linux_pte()?
 
 What lookup_linux_pte_and_update() does:-
  - find_linux_pte_or_hugepte()
  - does size and some other trivial checks
  - Then atomically update the pte:-
= while()
= wait till _PAGE_BUSY is clear
= atomically update the pte
= if not updated then go back to while() above else break
 
 
 While what lookup_linux_pte() does:-
  - find_linux_pte_or_hugepte()
  - does size and some other trivial checks
  - wait till _PAGE_BUSY is clear
  - return pte
 
 I am finding it difficult to call lookup_linux_pte() from 
 lookup_linux_pte_and_update().

You could factor out a common lookup_linux_ptep().

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] kvm: powerpc: define a linux pte lookup function

2013-10-08 Thread Scott Wood
On Tue, 2013-10-08 at 11:33 +0530, Bharat Bhushan wrote:
 We need to search linux pte to get pte attributes for
 setting TLB in KVM.
 This patch defines a linux_pte_lookup() function for same.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
  arch/powerpc/include/asm/pgtable.h |   35 +++
  1 files changed, 35 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/pgtable.h 
 b/arch/powerpc/include/asm/pgtable.h
 index 7d6eacf..fd26c04 100644
 --- a/arch/powerpc/include/asm/pgtable.h
 +++ b/arch/powerpc/include/asm/pgtable.h
 @@ -223,6 +223,41 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
 unsigned long addr,
  #endif
  pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
unsigned *shift);
 +
 +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
 +  unsigned long *pte_sizep)
 +{
 + pte_t *ptep;
 + pte_t pte;
 + unsigned long ps = *pte_sizep;
 + unsigned int shift;
 +
 + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
 + if (!ptep)
 + return __pte(0);
 + if (shift)
 + *pte_sizep = 1ul  shift;
 + else
 + *pte_sizep = PAGE_SIZE;
 +
 + if (ps  *pte_sizep)
 + return __pte(0);
 +
 + /* wait until _PAGE_BUSY is clear */
 + while (1) {
 + pte = pte_val(*ptep);
 + if (unlikely(pte  _PAGE_BUSY)) {
 + cpu_relax();
 + continue;
 + }
 + }
 +
 + /* If pte is not present return None */
 + if (unlikely(!(pte  _PAGE_PRESENT)))
 + return __pte(0);
 +
 + return pte;
 +}

Can lookup_linux_pte_and_update() call lookup_linux_pte()?

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()

2013-10-07 Thread Scott Wood
On Mon, 2013-10-07 at 22:23 +0530, Bharat Bhushan wrote:
 kvm_hypercall() have nothing KVM specific, so renamed to epapr_hypercall().
 Also this in moved to arch/powerpc/include/asm/epapr_hcalls.h
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 v1-v2
  - epapr_hypercall() is always defined and returns EV_UNIMPLEMENTED
when CONFIG_KVM_GUEST or CONFIG_EPAPR_PARAVIRT not defined.
 
  arch/powerpc/include/asm/epapr_hcalls.h |   46 
 +++
  arch/powerpc/include/asm/kvm_para.h |   23 ---
  arch/powerpc/kernel/kvm.c   |   41 +--
  3 files changed, 54 insertions(+), 56 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
 b/arch/powerpc/include/asm/epapr_hcalls.h
 index d3d6342..6b8e007 100644
 --- a/arch/powerpc/include/asm/epapr_hcalls.h
 +++ b/arch/powerpc/include/asm/epapr_hcalls.h
 @@ -454,5 +454,51 @@ static inline unsigned int ev_idle(void)
  
   return r3;
  }
 +
 +#if defined(CONFIG_KVM_GUEST) || defined(CONFIG_EPAPR_PARAVIRT)

CONFIG_KVM_GUEST implies CONFIG_EPAPR_PARAVIRT, so you only need to
check for the latter.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-10-04 Thread Scott Wood
On Fri, 2013-09-06 at 01:06 +0200, Alexander Graf wrote:
 On 06.09.2013, at 00:09, Scott Wood wrote:
 
  On Thu, 2013-07-11 at 01:09 +0200, Alexander Graf wrote:
  On 11.07.2013, at 01:08, Scott Wood wrote:
  
  On 07/10/2013 06:04:53 PM, Alexander Graf wrote:
  On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote:
  On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote:
  #ifdef CONFIG_PPC64
  + /*
  +  * To avoid races, the caller must have gone directly from 
  having
  +  * interrupts fully-enabled to hard-disabled.
  +  */
  + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);
  
  WARN_ON(lazy_irq_pending()); ?
  
  Different semantics. What you propose will not catch irq_happened == 0 
  :-)
  Right, but we only ever reach here after hard_irq_disable() I think.
  
  And the WARN_ON helps us ensure that it stays that way.
  
  Heh - ok :). Works for me.
  
  What's the status on this patch?
 
 IIUC it was ok. Ben, could you please verify?

ping

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()

2013-10-02 Thread Scott Wood
On Wed, 2013-10-02 at 16:19 +0200, Alexander Graf wrote:
 On 23.09.2013, at 07:23, Bharat Bhushan wrote:
  static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2)
  @@ -65,7 +54,7 @@ static inline long kvm_hypercall0_1(unsigned int nr, 
  unsigned long *r2)
  unsigned long out[8];
  unsigned long r;
  
  -   r = kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
  +   r = epapr_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 
 Won't this break when CONFIG_EPAPR_PARAVIRT=n? We wouldn't have 
 epapr_hcalls.S compiled into the code base then and the bl above would 
 reference an unknown function.

KVM_GUEST selects EPAPR_PARAVIRT.

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()

2013-10-02 Thread Scott Wood
On Wed, 2013-10-02 at 18:53 +0200, Alexander Graf wrote:
 On 02.10.2013, at 18:40, Scott Wood wrote:
 
  On Wed, 2013-10-02 at 16:19 +0200, Alexander Graf wrote:
  On 23.09.2013, at 07:23, Bharat Bhushan wrote:
  static inline long kvm_hypercall0_1(unsigned int nr, unsigned long *r2)
  @@ -65,7 +54,7 @@ static inline long kvm_hypercall0_1(unsigned int nr, 
  unsigned long *r2)
unsigned long out[8];
unsigned long r;
  
  - r = kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
  + r = epapr_hypercall(in, out, KVM_HCALL_TOKEN(nr));
  
  Won't this break when CONFIG_EPAPR_PARAVIRT=n? We wouldn't have 
  epapr_hcalls.S compiled into the code base then and the bl above would 
  reference an unknown function.
  
  KVM_GUEST selects EPAPR_PARAVIRT.
 
 But you can not select KVM_GUEST and still call these inline functions, no?

No.

  Like kvm_arch_para_features().

Where does that get called without KVM_GUEST?

How would that work currently, with the call to kvm_hypercall() in
arch/powerpc/kernel/kvm.c (which calls epapr_hypercall, BTW)?

-Scott



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   >