Re: powerpc/64: Include KVM guest test in all interrupt vectors
On Thu, 2015-12-11 at 05:44:42 UTC, Paul Mackerras wrote: > Currently, if HV KVM is configured but PR KVM isn't, we don't include > a test to see whether we were interrupted in KVM guest context for the > set of interrupts which get delivered directly to the guest by hardware > if they occur in the guest. This includes things like program > interrupts. > > However, the recent bug where userspace could set the MSR for a VCPU > to have an illegal value in the TS field, and thus cause a TM Bad Thing > type of program interrupt on the hrfid that enters the guest, showed that > we can never be completely sure that these interrupts can never occur > in the guest entry/exit code. If one of these interrupts does happen > and we have HV KVM configured but not PR KVM, then we end up trying to > run the handler in the host with the MMU set to the guest MMU context, > which generally ends badly. > > Thus, for robustness it is better to have the test in every interrupt > vector, so that if some way is found to trigger some interrupt in the > guest entry/exit path, we can handle it without immediately crashing > the host. > > This means that the distinction between KVMTEST and KVMTEST_PR goes > away. Thus we delete KVMTEST_PR and associated macros and use KVMTEST > everywhere that we previously used either KVMTEST_PR or KVMTEST. It > also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR, > so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead. > > Signed-off-by: Paul MackerrasApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/31a40e2b052c0f2b80df7b56 cheers -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
On 20/11/15 09:11, Thomas Huth wrote: > In the old DABR register, the BT (Breakpoint Translation) bit > is bit number 61. In the new DAWRX register, the WT (Watchpoint > Translation) bit is bit number 59. So to move the DABR-BT bit > into the position of the DAWRX-WT bit, it has to be shifted by > two, not only by one. This fixes hardware watchpoints in gdb of > older guests that only use the H_SET_DABR/X interface instead > of the new H_SET_MODE interface. > > Signed-off-by: Thomas Huth> --- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index b98889e..3983b87 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > > /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */ > 2: rlwimi r5, r4, 5, DAWRX_DR | DAWRX_DW > - rlwimi r5, r4, 1, DAWRX_WT > + rlwimi r5, r4, 2, DAWRX_WT > clrrdi r4, r4, 3 > std r4, VCPU_DAWR(r3) > std r5, VCPU_DAWRX(r3) Ping? -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
On Tue, Sep 15, 2015 at 08:49:33PM +1000, Alexey Kardashevskiy wrote: > This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one > exit path. This allows next patch to add locks nicely. I don't see a problem with the actual code, but it doesn't seem to match this description: I still see multiple return statements for h_put_tce at least. > This moves the ioba boundaries check to a helper and adds a check for > least bits which have to be zeros. > > The patch is pretty mechanical (only check for least ioba bits is added) > so no change in behaviour is expected. > > Signed-off-by: Alexey Kardashevskiy> --- > arch/powerpc/kvm/book3s_64_vio_hv.c | 102 > +++- > 1 file changed, 66 insertions(+), 36 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c > b/arch/powerpc/kvm/book3s_64_vio_hv.c > index 89e96b3..8ae12ac 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -35,71 +35,101 @@ > #include > #include > #include > +#include > > #define TCES_PER_PAGE(PAGE_SIZE / sizeof(u64)) > > +/* > + * Finds a TCE table descriptor by LIOBN. > + * > + * WARNING: This will be called in real or virtual mode on HV KVM and virtual > + * mode on PR KVM > + */ > +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu > *vcpu, > + unsigned long liobn) > +{ > + struct kvm *kvm = vcpu->kvm; > + struct kvmppc_spapr_tce_table *stt; > + > + list_for_each_entry_rcu_notrace(stt, >arch.spapr_tce_tables, list) > + if (stt->liobn == liobn) > + return stt; > + > + return NULL; > +} > + > +/* > + * Validates IO address. > + * > + * WARNING: This will be called in real-mode on HV KVM and virtual > + * mode on PR KVM > + */ > +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, > + unsigned long ioba, unsigned long npages) > +{ > + unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1; > + unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K; > + unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K; > + > + if ((ioba & mask) || (size + npages <= idx)) > + return H_PARAMETER; Not sure if it's worth a check for overflow in (size+npages) there. > + > + return H_SUCCESS; > +} > + > /* WARNING: This will be called in real-mode on HV KVM and virtual > * mode on PR KVM > */ > long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba, unsigned long tce) > { > - struct kvm *kvm = vcpu->kvm; > - struct kvmppc_spapr_tce_table *stt; > + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); > + long ret = H_TOO_HARD; > + unsigned long idx; > + struct page *page; > + u64 *tbl; > > /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ > /* liobn, ioba, tce); */ > > - list_for_each_entry(stt, >arch.spapr_tce_tables, list) { > - if (stt->liobn == liobn) { > - unsigned long idx = ioba >> SPAPR_TCE_SHIFT; > - struct page *page; > - u64 *tbl; > + if (!stt) > + return ret; > > - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p > window_size=0x%x\n", */ > - /* liobn, stt, stt->window_size); */ > - if (ioba >= stt->window_size) > - return H_PARAMETER; > + ret = kvmppc_ioba_validate(stt, ioba, 1); > + if (ret) > + return ret; > > - page = stt->pages[idx / TCES_PER_PAGE]; > - tbl = (u64 *)page_address(page); > + idx = ioba >> SPAPR_TCE_SHIFT; > + page = stt->pages[idx / TCES_PER_PAGE]; > + tbl = (u64 *)page_address(page); > > - /* FIXME: Need to validate the TCE itself */ > - /* udbg_printf("tce @ %p\n", [idx % > TCES_PER_PAGE]); */ > - tbl[idx % TCES_PER_PAGE] = tce; > - return H_SUCCESS; > - } > - } > + /* FIXME: Need to validate the TCE itself */ > + /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */ > + tbl[idx % TCES_PER_PAGE] = tce; > > - /* Didn't find the liobn, punt it to userspace */ > - return H_TOO_HARD; > + return ret; So, this relies on the fact that kvmppc_ioba_validate() would have returned H_SUCCESS some distance above. This seems rather fragile if you insert anything else which alters ret in between. Since this is the success path, I think it's clearer to explicitly return H_SUCCESS. > } > EXPORT_SYMBOL_GPL(kvmppc_h_put_tce); > > long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba) > { > - struct kvm *kvm = vcpu->kvm; > - struct
Re: [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu
On Tue, Sep 15, 2015 at 08:49:31PM +1000, Alexey Kardashevskiy wrote: > This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace > which use rcu_dereference_raw_notrace instead of rcu_dereference_raw. > This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off). > > Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson > --- > include/linux/rculist.h | 38 ++ > 1 file changed, 38 insertions(+) > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index 17c6b1f..439c4d7 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head > *list, > }) > > /** > + * list_entry_rcu_notrace - get the struct for this entry > + * @ptr:the list_head pointer. > + * @type: the type of the struct this is embedded in. > + * @member: the name of the list_struct within the struct. > + * > + * This primitive may safely run concurrently with the _rcu list-mutation > + * primitives such as list_add_rcu() as long as it's guarded by > rcu_read_lock(). > + * > + * This is the same as list_entry_rcu() except that it does > + * not do any RCU debugging or tracing. > + */ > +#define list_entry_rcu_notrace(ptr, type, member) \ > +({ \ > + typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ > + container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \ > + type, member); \ > +}) > + > +/** > * Where are list_empty_rcu() and list_first_entry_rcu()? > * > * Implementing those functions following their counterparts list_empty() and > @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head > *list, > pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) > > /** > + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type > + * @pos: the type * to use as a loop cursor. > + * @head:the head for your list. > + * @member: the name of the list_struct within the struct. > + * > + * This list-traversal primitive may safely run concurrently with > + * the _rcu list-mutation primitives such as list_add_rcu() > + * as long as the traversal is guarded by rcu_read_lock(). > + * > + * This is the same as list_for_each_entry_rcu() except that it does > + * not do any RCU debugging or tracing. > + */ > +#define list_for_each_entry_rcu_notrace(pos, head, member) \ > + for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \ > + >member != (head); \ > + pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \ > + member)) > + > +/** > * list_for_each_entry_continue_rcu - continue iteration over list of given > type > * @pos: the type * to use as a loop cursor. > * @head:the head for your list. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables
On Tue, Sep 15, 2015 at 08:49:34PM +1000, Alexey Kardashevskiy wrote: > At the moment spapr_tce_tables is not protected against races. This makes > use of RCU-variants of list helpers. As some bits are executed in real > mode, this makes use of just introduced list_for_each_entry_rcu_notrace(). > > This converts release_spapr_tce_table() to a RCU scheduled handler. > > Signed-off-by: Alexey KardashevskiyLooks correct on my limited knowledge of RCU Reviewed-by: David Gibson -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public
On Tue, Sep 15, 2015 at 08:49:32PM +1000, Alexey Kardashevskiy wrote: > This helper translates vmalloc'd addresses to linear addresses. > It is only used by the KVM MMU code now and resides in the HV KVM code. > We will need it further in the TCE code and the DMA memory preregistration > code called in real mode. > > This makes real_vmalloc_addr() public and moves it to the powerpc code as > it does not do anything special for KVM. > > Signed-off-by: Alexey KardashevskiyHmm, I have a couple of small concerns. First, I'm not sure if the name is clear enough for a public function. Second, I'm not sure if mmu-hash64.h is the right place for it. This is still a function with very specific and limited usage, I wonder if we should have somewhere else for such special real mode helpers. Paulus, thoughts? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature