Re: powerpc/64: Include KVM guest test in all interrupt vectors

2015-12-07 Thread Michael Ellerman
On Thu, 2015-12-11 at 05:44:42 UTC, Paul Mackerras wrote:
> Currently, if HV KVM is configured but PR KVM isn't, we don't include
> a test to see whether we were interrupted in KVM guest context for the
> set of interrupts which get delivered directly to the guest by hardware
> if they occur in the guest.  This includes things like program
> interrupts.
> 
> However, the recent bug where userspace could set the MSR for a VCPU
> to have an illegal value in the TS field, and thus cause a TM Bad Thing
> type of program interrupt on the hrfid that enters the guest, showed that
> we can never be completely sure that these interrupts can never occur
> in the guest entry/exit code.  If one of these interrupts does happen
> and we have HV KVM configured but not PR KVM, then we end up trying to
> run the handler in the host with the MMU set to the guest MMU context,
> which generally ends badly.
> 
> Thus, for robustness it is better to have the test in every interrupt
> vector, so that if some way is found to trigger some interrupt in the
> guest entry/exit path, we can handle it without immediately crashing
> the host.
> 
> This means that the distinction between KVMTEST and KVMTEST_PR goes
> away.  Thus we delete KVMTEST_PR and associated macros and use KVMTEST
> everywhere that we previously used either KVMTEST_PR or KVMTEST.  It
> also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
> so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.
> 
> Signed-off-by: Paul Mackerras 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/31a40e2b052c0f2b80df7b56

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-07 Thread Thomas Huth
On 20/11/15 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)

Ping?

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:33PM +1000, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
> exit path. This allows next patch to add locks nicely.

I don't see a problem with the actual code, but it doesn't seem to
match this description: I still see multiple return statements for
h_put_tce at least.

> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 102 
> +++-
>  1 file changed, 66 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..8ae12ac 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,101 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define TCES_PER_PAGE(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *  mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu 
> *vcpu,
> + unsigned long liobn)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct kvmppc_spapr_tce_table *stt;
> +
> + list_for_each_entry_rcu_notrace(stt, >arch.spapr_tce_tables, list)
> + if (stt->liobn == liobn)
> + return stt;
> +
> + return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *  mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> + unsigned long ioba, unsigned long npages)
> +{
> + unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> + unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> + unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> + if ((ioba & mask) || (size + npages <= idx))
> + return H_PARAMETER;

Not sure if it's worth a check for overflow in (size+npages) there.

> +
> + return H_SUCCESS;
> +}
> +
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *  mode on PR KVM
>   */
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba, unsigned long tce)
>  {
> - struct kvm *kvm = vcpu->kvm;
> - struct kvmppc_spapr_tce_table *stt;
> + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + long ret = H_TOO_HARD;
> + unsigned long idx;
> + struct page *page;
> + u64 *tbl;
>  
>   /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>   /*  liobn, ioba, tce); */
>  
> - list_for_each_entry(stt, >arch.spapr_tce_tables, list) {
> - if (stt->liobn == liobn) {
> - unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> - struct page *page;
> - u64 *tbl;
> + if (!stt)
> + return ret;
>  
> - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  
> window_size=0x%x\n", */
> - /*  liobn, stt, stt->window_size); */
> - if (ioba >= stt->window_size)
> - return H_PARAMETER;
> + ret = kvmppc_ioba_validate(stt, ioba, 1);
> + if (ret)
> + return ret;
>  
> - page = stt->pages[idx / TCES_PER_PAGE];
> - tbl = (u64 *)page_address(page);
> + idx = ioba >> SPAPR_TCE_SHIFT;
> + page = stt->pages[idx / TCES_PER_PAGE];
> + tbl = (u64 *)page_address(page);
>  
> - /* FIXME: Need to validate the TCE itself */
> - /* udbg_printf("tce @ %p\n", [idx % 
> TCES_PER_PAGE]); */
> - tbl[idx % TCES_PER_PAGE] = tce;
> - return H_SUCCESS;
> - }
> - }
> + /* FIXME: Need to validate the TCE itself */
> + /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */
> + tbl[idx % TCES_PER_PAGE] = tce;
>  
> - /* Didn't find the liobn, punt it to userspace */
> - return H_TOO_HARD;
> + return ret;

So, this relies on the fact that kvmppc_ioba_validate() would have
returned H_SUCCESS some distance above.  This seems rather fragile if
you insert anything else which alters ret in between.  Since this is
the success path, I think it's clearer to explicitly return H_SUCCESS.

>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba)
>  {
> - struct kvm *kvm = vcpu->kvm;
> - struct 

Re: [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:31PM +1000, Alexey Kardashevskiy wrote:
> This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
> which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
> This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  include/linux/rculist.h | 38 ++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..439c4d7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head 
> *list,
>  })
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry
> + * @ptr:the  list_head pointer.
> + * @type:   the type of the struct this is embedded in.
> + * @member: the name of the list_struct within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by 
> rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> + typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> + container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
> + type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head 
> *list,
>   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
> + * @pos: the type * to use as a loop cursor.
> + * @head:the head for your list.
> + * @member:  the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> + for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> + >member != (head); \
> + pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
> + member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given 
> type
>   * @pos: the type * to use as a loop cursor.
>   * @head:the head for your list.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:34PM +1000, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races. This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.
> 
> Signed-off-by: Alexey Kardashevskiy 

Looks correct on my limited knowledge of RCU

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:32PM +1000, Alexey Kardashevskiy wrote:
> This helper translates vmalloc'd addresses to linear addresses.
> It is only used by the KVM MMU code now and resides in the HV KVM code.
> We will need it further in the TCE code and the DMA memory preregistration
> code called in real mode.
> 
> This makes real_vmalloc_addr() public and moves it to the powerpc code as
> it does not do anything special for KVM.
> 
> Signed-off-by: Alexey Kardashevskiy 

Hmm, I have a couple of small concerns.

First, I'm not sure if the name is clear enough for a public function.

Second, I'm not sure if mmu-hash64.h is the right place for it.  This
is still a function with very specific and limited usage, I wonder if
we should have somewhere else for such special real mode helpers.

Paulus, thoughts?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature