Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
Il 18/07/2013 21:57, Gleb Natapov ha scritto: > On Thu, Jul 18, 2013 at 02:08:51PM +0200, Paolo Bonzini wrote: >> Il 18/07/2013 13:06, Gleb Natapov ha scritto: >>> On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote: >> and for a testsuite I'd prefer the latter---which means I'd still favor >> setjmp/longjmp. >> >> Now, here is the long explanation. >> >> I must admit that the code looks nice. There are some nits I'd like to >> see done differently (such as putting vmx_return at the beginning of the >> while (1), and the vmresume asm at the end), but it is indeed nice. > > Why do you prefer setjmp/longjmp then? Because it is still deceiving, and I dislike the deceit more than I like the linear code flow. >>> What is deceiving about it? Of course for someone who has no idea how >>> vmx works the code will not be obvious, but why should we care. For >>> someone who knows what is deceiving about returning into the same >>> function guest was launched from by using VMX mechanism >> >> The way the code is written is deceiving. If I see >> >> asm("vmlaunch; seta %0") >> while (ret) >> >> I expect HOST_RIP to point at the seta or somewhere near, not at a >> completely different label somewhere else. >> > Why would you expect that assuming you know what vmlaunch is? Because this is written in C, and I know trying to fool the compiler is a losing game. So my reaction is "okay, HOST_RIP must be set so that code will not jump around". If I see asm("vmlaunch") exit(-1) the reaction is the opposite: "hmm, anything that jumps around would have a hard time with the compiler, there must be some assembly trampoline somewhere; let's check what HOST_RIP is". instead of longjmp()? >> >> Look again at the setjmp/longjmp version. longjmp is not used to handle >> vmexit. It is used to jump back from the vmexit handler to main, which >> is exactly what setjmp/longjmp is meant for. >> > That's because simple return will not work in that version, this is > artifact of how vmexit was done. I think it can be made to work without setjmp/longjmp, but the code would be ugly. the compiler, and you rely on the compiler not changing %rsp between the vmlaunch and the vmx_return label. Minor nit, you cannot anymore print >>> HOST_RSP should be loaded on each guest entry. >> >> Right, but my point is: without a big asm blob, you don't know the right >> value to load. It depends on the generated code. And the big asm blob >> limits a lot the "code looks nice" value of this approach. >> > I said it number of times already, this is not about "code looking nice", > "code looks like KVM" or use less assembler as possible", this is about > linear data flow. It is not fun chasing code execution path. Yes, you > can argue that vmlaunch/vmresume inherently non linear, but there is a > difference between skipping one instruction and remain in the same > function and on the same stack, or jump completely to a different > context. I don't see anything bad in jumping completely to a different context. The guest and host are sort of like two coroutines, they hardly have any connection with the code that called vmlaunch. > The actually differences in asm instruction between both > version will not be bigger then a couple of lines (if we will not take > setjmp/longjmp implementation into account :)), I was waiting for this parenthetical remark to appear. ;) > but I do not even see > why we discuss this argument since minimizing asm instructions here is > not an point. We should not use more then needed to achieve the goal of > course, but design should not be around number of assembly lines. I agree, I only mentioned it because you talked about asm C asm C and this is not what the setjmp/longjmp code looks like---using inlines for asm as if they were builtin functions is good, interspersing asm and C in the same function is bad. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/10 v6] KVM: PPC: IOMMU in-kernel handling
On 07/16/2013 10:53 AM, Alexey Kardashevskiy wrote: > The changes are: > 1. rebased on v3.11-rc1 so the capability numbers changed again > 2. fixed multiple comments from maintainers > 3. "KVM: PPC: Add support for IOMMU in-kernel handling" is split into > 2 patches, the new one is "powerpc/iommu: rework to support realmode". > 4. IOMMU_API is now always enabled for KVM_BOOK3S_64. > > MOre details in the individual patch comments. > > Depends on "hashtable: add hash_for_each_possible_rcu_notrace()", > posted a while ago. > > > Alexey Kardashevskiy (10): > KVM: PPC: reserve a capability number for multitce support > KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO Alex, could you please pull these 2 patches or tell what is wrong with them? Having them sooner in the kernel would let me ask for a headers update for QEMU and then I would try pushing miltiple TCE and VFIO support in QEMU. Thanks. -- Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5
From: Andi Kleen [KVM maintainers: The underlying support for this is in perf/core now. So please merge this patch into the KVM tree.] This is not arch perfmon, but older CPUs will just ignore it. This makes it possible to do at least some TSX measurements from a KVM guest v2: Various fixes to address review feedback v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits. v4: Use reserved bits for #GP v5: Remove obsolete argument Acked-by: Gleb Natapov Signed-off-by: Andi Kleen --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/pmu.c | 25 - 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index af9c552..01493a1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -315,6 +315,7 @@ struct kvm_pmu { u64 global_ovf_ctrl; u64 counter_bitmask[2]; u64 global_ctrl_mask; + u64 reserved_bits; u8 version; struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index c53e797..5c4f631 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc) static void reprogram_counter(struct kvm_pmc *pmc, u32 type, unsigned config, bool exclude_user, bool exclude_kernel, - bool intr) + bool intr, bool in_tx, bool in_tx_cp) { struct perf_event *event; struct perf_event_attr attr = { @@ -173,6 +173,10 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type, .exclude_kernel = exclude_kernel, .config = config, }; + if (in_tx) + attr.config |= HSW_IN_TX; + if (in_tx_cp) + attr.config |= HSW_IN_TX_CHECKPOINTED; attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc); @@ -226,7 +230,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE | ARCH_PERFMON_EVENTSEL_INV | - ARCH_PERFMON_EVENTSEL_CMASK))) { + ARCH_PERFMON_EVENTSEL_CMASK | + HSW_IN_TX | + HSW_IN_TX_CHECKPOINTED))) { config = find_arch_event(&pmc->vcpu->arch.pmu, event_select, unit_mask); if (config != PERF_COUNT_HW_MAX) @@ -239,7 +245,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) reprogram_counter(pmc, type, config, !(eventsel & ARCH_PERFMON_EVENTSEL_USR), !(eventsel & ARCH_PERFMON_EVENTSEL_OS), - eventsel & ARCH_PERFMON_EVENTSEL_INT); + eventsel & ARCH_PERFMON_EVENTSEL_INT, + (eventsel & HSW_IN_TX), + (eventsel & HSW_IN_TX_CHECKPOINTED)); } static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx) @@ -256,7 +264,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx) arch_events[fixed_pmc_events[idx]].event_type, !(en & 0x2), /* exclude user */ !(en & 0x1), /* exclude kernel */ - pmi); + pmi, false, false); } static inline u8 fixed_en_pmi(u64 ctrl, int idx) @@ -408,7 +416,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) { if (data == pmc->eventsel) return 0; - if (!(data & 0x0020ull)) { + if (!(data & pmu->reserved_bits)) { reprogram_gp_counter(pmc, data); return 0; } @@ -450,6 +458,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) pmu->counter_bitmask[KVM_PMC_GP] = 0; pmu->counter_bitmask[KVM_PMC_FIXED] = 0; pmu->version = 0; + pmu->reserved_bits = 0x0020ull; entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); if (!entry) @@ -478,6 +487,12 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | (((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED); pmu->global_ctrl_mask = ~pmu->global_ctrl; + + entry = kvm_find_cpuid_entry(vcpu, 7, 0); + if (entry && + (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && + (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) + pmu->reserved_bits ^= HSW_IN_TX|H
Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
On Thu, Jul 18, 2013 at 02:08:51PM +0200, Paolo Bonzini wrote: > Il 18/07/2013 13:06, Gleb Natapov ha scritto: > > On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote: > and for a testsuite I'd prefer the latter---which means I'd still favor > setjmp/longjmp. > > Now, here is the long explanation. > > I must admit that the code looks nice. There are some nits I'd like to > see done differently (such as putting vmx_return at the beginning of the > while (1), and the vmresume asm at the end), but it is indeed nice. > >>> > >>> Why do you prefer setjmp/longjmp then? > >> > >> Because it is still deceiving, and I dislike the deceit more than I like > >> the linear code flow. > >> > > What is deceiving about it? Of course for someone who has no idea how > > vmx works the code will not be obvious, but why should we care. For > > someone who knows what is deceiving about returning into the same > > function guest was launched from by using VMX mechanism > > The way the code is written is deceiving. If I see > > asm("vmlaunch; seta %0") > while (ret) > > I expect HOST_RIP to point at the seta or somewhere near, not at a > completely different label somewhere else. > Why would you expect that assuming you know what vmlaunch is? So it is OK when HOST_RIP point to somewhere outside a function, but "deceiving" if it point 3 lines down in the same function? I have a hard time following this logic. > >> instead of longjmp()? > > Look again at the setjmp/longjmp version. longjmp is not used to handle > vmexit. It is used to jump back from the vmexit handler to main, which > is exactly what setjmp/longjmp is meant for. > That's because simple return will not work in that version, this is artifact of how vmexit was done. > >> It is still somewhat magic: the "while (ret)" is only there to please > >> the compiler > > > > No, it it there to catch vmlaunch/vmresume errors. > > You could change it to "while (0)" and it would still work. That's what > I mean by "only there to please the compiler". But while (1) will not, so the code is executed, it is not just for compiler there, but it is executed only if vmlaunch/vmresume fails. > > >> the compiler, and you rely on the compiler not changing %rsp between the > >> vmlaunch and the vmx_return label. Minor nit, you cannot anymore print > > HOST_RSP should be loaded on each guest entry. > > Right, but my point is: without a big asm blob, you don't know the right > value to load. It depends on the generated code. And the big asm blob > limits a lot the "code looks nice" value of this approach. > I said it number of times already, this is not about "code looking nice", "code looks like KVM" or use less assembler as possible", this is about linear data flow. It is not fun chasing code execution path. Yes, you can argue that vmlaunch/vmresume inherently non linear, but there is a difference between skipping one instruction and remain in the same function and on the same stack, or jump completely to a different context. Speaking about KVM. Guest enter/exit assembly blob is around ~50 lines if assembly code and more then half of that is saving restoring context. And the code plays some tricks there for optimization that we do not need to do here, so I expect the code to be even smaller, not much more then 10 lines of assembly excluding state save/restore. > >> different error messages for vmlaunch vs. vmresume failure. > > Just because the same variable is used to store error values :) > > Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)" > > Yeah. :) > > >> In the end the choice is between "all in asm" and "small asm trampoline" > >> (which also happens to use setjmp/longjmp, but perhaps Arthur can > >> propagate return codes without using setjmp/longjmp, too). > >> > >>> Rewriting the whole guest entry exit path in asm like you suggest bellow > >>> will eliminate the magic too. > >> > >>> I much prefer one big asm statement than many small asm statement > >>> scattered among two or three C lines. > >> > >> It's not many asm statements, it's just a dozen lines: > >> > > I am not talking about overall file, but the how vmx_run() is written: > > asm() > > C code > > asm() > > C code > > ... > > > > I much prefer: > > C code > > > > big asm() blob > > > > C code. > > Me too. But the trampoline version is neither, it is > > static inline void vmlaunch() { asm("vmlaunch") } > static inline void vmresume() { asm("vmresume") } > small asm() blob I is small only because context save restore is hidden behind macro and 4 asm instructions (vmlaunch/seta/vmresume/seta) a hidden somewhere else. The actually differences in asm instruction between both version will not be bigger then a couple of lines (if we will not take setjmp/longjmp implementation into account :)), but I do not even see why we discuss this argument since minimizing asm instructions here is not an point. We should not use mor
Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 12:32:18 PM, Alexander Graf wrote: On 18.07.2013, at 19:17, Scott Wood wrote: > On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote: > Likewise, we want to make sure this matches the host entry. Unfortunately, this is a bit of a mess already. 64-bit booke appears to always set MAS2_M for TLB0 mappings. The initial KERNELBASE mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC) replaces it uses _PAGE_COHERENT. 32-bit always uses _PAGE_COHERENT, except that initial KERNELBASE mapping. _PAGE_COHERENT appears to be set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case). > > As for what we actually want to happen, there are cases when we want M to be set for non-SMP. One such case is AMP, where CPUs may be sharing memory even if the Linux instance only runs on one CPU (this is not hypothetical, BTW). It's also possible that we encounter a hardware bug that requires MAS2_M, similar to what some of our non-booke chips require. How about we always set M then for RAM? M is like I in that bad things happen if you mix them. So we really want to match exactly what the rest of the kernel is doing. Plus, the performance penalty on some single-core chips can be pretty bad. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 19:17, Scott Wood wrote: > On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote: >> If there is a struct page for the requested mapping then it's >> normal RAM and the mapping is set to "M" bit (coherent, cacheable) >> otherwise this is treated as I/O and we set "I + G" (cache inhibited, >> guarded) >> This helps setting proper TLB mapping for direct assigned device >> Signed-off-by: Bharat Bhushan >> --- >> v2: some cleanup and added comment >> - >> arch/powerpc/kvm/e500_mmu_host.c | 23 ++- >> 1 files changed, 18 insertions(+), 5 deletions(-) >> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >> b/arch/powerpc/kvm/e500_mmu_host.c >> index 1c6a9d7..02eb973 100644 >> --- a/arch/powerpc/kvm/e500_mmu_host.c >> +++ b/arch/powerpc/kvm/e500_mmu_host.c >> @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int >> usermode) >> return mas3; >> } >> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) >> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) >> { >> +u32 mas2_attr; >> + >> +mas2_attr = mas2 & MAS2_ATTRIB_MASK; >> + >> +/* >> + * RAM is always mappable on e500 systems, so this is identical >> + * to kvm_is_mmio_pfn(), just without its overhead. >> + */ >> +if (!pfn_valid(pfn)) { > > Please use page_is_ram(), which is what gets used when setting the WIMG for > the host userspace mapping. We want to make sure the two are consistent. > >> +/* Pages not managed by Kernel are treated as I/O, set I + G */ >> +mas2_attr |= MAS2_I | MAS2_G; >> #ifdef CONFIG_SMP >> -return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; >> -#else >> -return mas2 & MAS2_ATTRIB_MASK; >> +} else { >> +/* Kernel managed pages are actually RAM so set "M" */ >> +mas2_attr |= MAS2_M; >> #endif > > Likewise, we want to make sure this matches the host entry. Unfortunately, > this is a bit of a mess already. 64-bit booke appears to always set MAS2_M > for TLB0 mappings. The initial KERNELBASE mapping on boot uses M_IF_SMP, and > the settlbcam() that (IIRC) replaces it uses _PAGE_COHERENT. 32-bit always > uses _PAGE_COHERENT, except that initial KERNELBASE mapping. _PAGE_COHERENT > appears to be set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter > config clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case). > > As for what we actually want to happen, there are cases when we want M to be > set for non-SMP. One such case is AMP, where CPUs may be sharing memory even > if the Linux instance only runs on one CPU (this is not hypothetical, BTW). > It's also possible that we encounter a hardware bug that requires MAS2_M, > similar to what some of our non-booke chips require. How about we always set M then for RAM? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal RAM and the mapping is set to "M" bit (coherent, cacheable) otherwise this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- v2: some cleanup and added comment - arch/powerpc/kvm/e500_mmu_host.c | 23 ++- 1 files changed, 18 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..02eb973 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + /* +* RAM is always mappable on e500 systems, so this is identical +* to kvm_is_mmio_pfn(), just without its overhead. +*/ + if (!pfn_valid(pfn)) { Please use page_is_ram(), which is what gets used when setting the WIMG for the host userspace mapping. We want to make sure the two are consistent. + /* Pages not managed by Kernel are treated as I/O, set I + G */ + mas2_attr |= MAS2_I | MAS2_G; #ifdef CONFIG_SMP - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 & MAS2_ATTRIB_MASK; + } else { + /* Kernel managed pages are actually RAM so set "M" */ + mas2_attr |= MAS2_M; #endif Likewise, we want to make sure this matches the host entry. Unfortunately, this is a bit of a mess already. 64-bit booke appears to always set MAS2_M for TLB0 mappings. The initial KERNELBASE mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC) replaces it uses _PAGE_COHERENT. 32-bit always uses _PAGE_COHERENT, except that initial KERNELBASE mapping. _PAGE_COHERENT appears to be set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case). As for what we actually want to happen, there are cases when we want M to be set for non-SMP. One such case is AMP, where CPUs may be sharing memory even if the Linux instance only runs on one CPU (this is not hypothetical, BTW). It's also possible that we encounter a hardware bug that requires MAS2_M, similar to what some of our non-booke chips require. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
On 07/18/2013 10:12:30 AM, Bhushan Bharat-R65777 wrote: > -Original Message- > From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On > Behalf Of Alexander Graf > Sent: Thursday, July 18, 2013 8:18 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan > Bharat-R65777 > Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2 > > > This needs a description. Why shouldn't we ignore E? What I understood is that "there is no reason to stop guest setting "E", so allow him." FWIW, it'd probably be safe to let the guest control the G bit as well. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 05:00:42 AM, Alexander Graf wrote: Now why is setting invalid flags a problem? If I understand Scott correctly, it can break the host if you access certain host devices with caching enabled. But to be sure I'd say we ask him directly :). The architecture makes it illegal to mix cacheable and cache-inhibited mappings to the same physical page. Mixing W or M bits is generally bad as well. I've seen it cause machine checks, error interrupts, etc. -- not just corrupting the page in question. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, July 18, 2013 8:50 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421 > Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2 > > > On 18.07.2013, at 17:12, Bhushan Bharat-R65777 wrote: > > > > > > >> -Original Message- > >> From: kvm-ppc-ow...@vger.kernel.org > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > >> Sent: Thursday, July 18, 2013 8:18 PM > >> To: Bhushan Bharat-R65777 > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; > >> Bhushan > >> Bharat-R65777 > >> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute > >> in mas2 > >> > >> > >> This needs a description. Why shouldn't we ignore E? > > > > What I understood is that "there is no reason to stop guest setting "E", so > allow him." > > Please add that to the patch description. Also explain what the bit means. Ok :) -Bharat > > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
On 18.07.2013, at 17:12, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On >> Behalf Of Alexander Graf >> Sent: Thursday, July 18, 2013 8:18 PM >> To: Bhushan Bharat-R65777 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan >> Bharat-R65777 >> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2 >> >> >> This needs a description. Why shouldn't we ignore E? > > What I understood is that "there is no reason to stop guest setting "E", so > allow him." Please add that to the patch description. Also explain what the bit means. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 17:15, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On >> Behalf Of Alexander Graf >> Sent: Thursday, July 18, 2013 8:23 PM >> To: Bhushan Bharat-R65777 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan >> Bharat-R65777 >> Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel >> managed pages >> >> >> On 18.07.2013, at 15:19, Bharat Bhushan wrote: >> >>> If there is a struct page for the requested mapping then it's normal >>> RAM and the mapping is set to "M" bit (coherent, cacheable) otherwise >>> this is treated as I/O and we set "I + G" (cache inhibited, guarded) >>> >>> This helps setting proper TLB mapping for direct assigned device >>> >>> Signed-off-by: Bharat Bhushan >>> --- >>> v2: some cleanup and added comment >>> - >>> arch/powerpc/kvm/e500_mmu_host.c | 23 ++- >>> 1 files changed, 18 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>> b/arch/powerpc/kvm/e500_mmu_host.c >>> index 1c6a9d7..02eb973 100644 >>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>> @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int >> usermode) >>> return mas3; >>> } >>> >>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) >>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) >>> { >>> + u32 mas2_attr; >>> + >>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK; >>> + >>> + /* >>> +* RAM is always mappable on e500 systems, so this is identical >>> +* to kvm_is_mmio_pfn(), just without its overhead. >>> +*/ >>> + if (!pfn_valid(pfn)) { >>> + /* Pages not managed by Kernel are treated as I/O, set I + G */ >> >> Please also document the intermediate thought that I/O should be mapped non- >> cached. > > I did not get what you mean to document? Page snot managed by the kernel are treated as I/O. Map it with disabled cache. > >> >>> + mas2_attr |= MAS2_I | MAS2_G; >>> #ifdef CONFIG_SMP >> >> Please separate the SMP case out of the branch. > > Really :) this was looking simple to me. Two branches intertwined never look simple :). > >> >>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; >>> -#else >>> - return mas2 & MAS2_ATTRIB_MASK; >>> + } else { >>> + /* Kernel managed pages are actually RAM so set "M" */ >> >> This comment doesn't tell me why M can be set ;). > > RAM in SMP, so setting coherent, is not that obvious? No. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On > Behalf Of Alexander Graf > Sent: Thursday, July 18, 2013 8:23 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan > Bharat-R65777 > Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel > managed pages > > > On 18.07.2013, at 15:19, Bharat Bhushan wrote: > > > If there is a struct page for the requested mapping then it's normal > > RAM and the mapping is set to "M" bit (coherent, cacheable) otherwise > > this is treated as I/O and we set "I + G" (cache inhibited, guarded) > > > > This helps setting proper TLB mapping for direct assigned device > > > > Signed-off-by: Bharat Bhushan > > --- > > v2: some cleanup and added comment > > - > > arch/powerpc/kvm/e500_mmu_host.c | 23 ++- > > 1 files changed, 18 insertions(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > > b/arch/powerpc/kvm/e500_mmu_host.c > > index 1c6a9d7..02eb973 100644 > > --- a/arch/powerpc/kvm/e500_mmu_host.c > > +++ b/arch/powerpc/kvm/e500_mmu_host.c > > @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int > usermode) > > return mas3; > > } > > > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) > > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > > { > > + u32 mas2_attr; > > + > > + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > > + > > + /* > > +* RAM is always mappable on e500 systems, so this is identical > > +* to kvm_is_mmio_pfn(), just without its overhead. > > +*/ > > + if (!pfn_valid(pfn)) { > > + /* Pages not managed by Kernel are treated as I/O, set I + G */ > > Please also document the intermediate thought that I/O should be mapped non- > cached. I did not get what you mean to document? > > > + mas2_attr |= MAS2_I | MAS2_G; > > #ifdef CONFIG_SMP > > Please separate the SMP case out of the branch. Really :) this was looking simple to me. > > > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; > > -#else > > - return mas2 & MAS2_ATTRIB_MASK; > > + } else { > > + /* Kernel managed pages are actually RAM so set "M" */ > > This comment doesn't tell me why M can be set ;). RAM in SMP, so setting coherent, is not that obvious? -Bharat > > > Alex > > > + mas2_attr |= MAS2_M; > > #endif > > + } > > + return mas2_attr; > > } > > > > /* > > @@ -313,7 +326,7 @@ static void kvmppc_e500_setup_stlbe( > > /* Force IPROT=0 for all guest mappings. */ > > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; > > stlbe->mas2 = (gvaddr & MAS2_EPN) | > > - e500_shadow_mas2_attrib(gtlbe->mas2, pr); > > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); > > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | > > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); > > > > -- > > 1.7.0.4 > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > > the body of a message to majord...@vger.kernel.org More majordomo info > > at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body > of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On > Behalf Of Alexander Graf > Sent: Thursday, July 18, 2013 8:18 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan > Bharat-R65777 > Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2 > > > This needs a description. Why shouldn't we ignore E? What I understood is that "there is no reason to stop guest setting "E", so allow him." -Bharat > > > Alex > > On 18.07.2013, at 15:19, Bharat Bhushan wrote: > > > Signed-off-by: Bharat Bhushan > > --- > > v2: > > - No change > > arch/powerpc/kvm/e500.h |2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h index > > c2e5e98..277cb18 100644 > > --- a/arch/powerpc/kvm/e500.h > > +++ b/arch/powerpc/kvm/e500.h > > @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 > > *to_e500(struct kvm_vcpu *vcpu) #define E500_TLB_USER_PERM_MASK > > (MAS3_UX|MAS3_UR|MAS3_UW) #define E500_TLB_SUPER_PERM_MASK > > (MAS3_SX|MAS3_SR|MAS3_SW) #define MAS2_ATTRIB_MASK \ > > - (MAS2_X0 | MAS2_X1) > > + (MAS2_X0 | MAS2_X1 | MAS2_E) > > #define MAS3_ATTRIB_MASK \ > > (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \ > >| E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK) > > -- > > 1.7.0.4 > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body > of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 15:19, Bharat Bhushan wrote: > If there is a struct page for the requested mapping then it's > normal RAM and the mapping is set to "M" bit (coherent, cacheable) > otherwise this is treated as I/O and we set "I + G" (cache inhibited, > guarded) > > This helps setting proper TLB mapping for direct assigned device > > Signed-off-by: Bharat Bhushan > --- > v2: some cleanup and added comment > - > arch/powerpc/kvm/e500_mmu_host.c | 23 ++- > 1 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index 1c6a9d7..02eb973 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int > usermode) > return mas3; > } > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > { > + u32 mas2_attr; > + > + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > + > + /* > + * RAM is always mappable on e500 systems, so this is identical > + * to kvm_is_mmio_pfn(), just without its overhead. > + */ > + if (!pfn_valid(pfn)) { > + /* Pages not managed by Kernel are treated as I/O, set I + G */ Please also document the intermediate thought that I/O should be mapped non-cached. > + mas2_attr |= MAS2_I | MAS2_G; > #ifdef CONFIG_SMP Please separate the SMP case out of the branch. > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; > -#else > - return mas2 & MAS2_ATTRIB_MASK; > + } else { > + /* Kernel managed pages are actually RAM so set "M" */ This comment doesn't tell me why M can be set ;). Alex > + mas2_attr |= MAS2_M; > #endif > + } > + return mas2_attr; > } > > /* > @@ -313,7 +326,7 @@ static void kvmppc_e500_setup_stlbe( > /* Force IPROT=0 for all guest mappings. */ > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; > stlbe->mas2 = (gvaddr & MAS2_EPN) | > - e500_shadow_mas2_attrib(gtlbe->mas2, pr); > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); > > -- > 1.7.0.4 > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
This needs a description. Why shouldn't we ignore E? Alex On 18.07.2013, at 15:19, Bharat Bhushan wrote: > Signed-off-by: Bharat Bhushan > --- > v2: > - No change > arch/powerpc/kvm/e500.h |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h > index c2e5e98..277cb18 100644 > --- a/arch/powerpc/kvm/e500.h > +++ b/arch/powerpc/kvm/e500.h > @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct > kvm_vcpu *vcpu) > #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW) > #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW) > #define MAS2_ATTRIB_MASK \ > - (MAS2_X0 | MAS2_X1) > + (MAS2_X0 | MAS2_X1 | MAS2_E) > #define MAS3_ATTRIB_MASK \ > (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \ > | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK) > -- > 1.7.0.4 > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] PF: Async page fault support on s390
On 18/07/13 15:57, Paolo Bonzini wrote: > Il 11/07/2013 12:41, Christian Borntraeger ha scritto: >> On 11/07/13 11:04, Gleb Natapov wrote: >>> On Wed, Jul 10, 2013 at 02:59:55PM +0200, Dominik Dingel wrote: This patch enables async page faults for s390 kvm guests. It provides the userspace API to enable, disable or get the status of this feature. Also it includes the diagnose code, called by the guest to enable async page faults. The async page faults will use an already existing guest interface for this purpose, as described in "CP Programming Services (SC24-6084)". Signed-off-by: Dominik Dingel >>> Christian, looks good now? >> >> Looks good, but I just had a discussion with Dominik about several other >> cases >> (guest driven reboot, qemu driven reboot, life migration). This patch should >> allow all these cases (independent from this patch we need an ioctl to flush >> the >> list of pending interrupts to do so, but reboot is currently broken in that >> regard anyway - patch is currently being looked at) >> >> We are currently discussion if we should get rid of the APF_STATUS and let >> the kernel wait for outstanding page faults before returning from KVM_RUN >> or if we go with this patch and let userspace wait for completion. >> >> Will discuss this with Dominik, Conny and Alex. So lets defer that till next >> week, ok? > > Let us know if we should wait for a v5. :) Yes, there will be a v5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
On Thu, Jul 18, 2013 at 8:08 PM, Paolo Bonzini wrote: > Il 18/07/2013 13:06, Gleb Natapov ha scritto: >> On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote: > and for a testsuite I'd prefer the latter---which means I'd still favor > setjmp/longjmp. > > Now, here is the long explanation. > > I must admit that the code looks nice. There are some nits I'd like to > see done differently (such as putting vmx_return at the beginning of the > while (1), and the vmresume asm at the end), but it is indeed nice. Why do you prefer setjmp/longjmp then? >>> >>> Because it is still deceiving, and I dislike the deceit more than I like >>> the linear code flow. >>> >> What is deceiving about it? Of course for someone who has no idea how >> vmx works the code will not be obvious, but why should we care. For >> someone who knows what is deceiving about returning into the same >> function guest was launched from by using VMX mechanism > > The way the code is written is deceiving. If I see > > asm("vmlaunch; seta %0") > while (ret) > > I expect HOST_RIP to point at the seta or somewhere near, not at a > completely different label somewhere else. Here for me, I prefer a separate asm blob of entry_vmx instead of one "hidden" in a C function, which may make it much harder to trace for someone not familiar with vmx in KVM. > >>> instead of longjmp()? > > Look again at the setjmp/longjmp version. longjmp is not used to handle > vmexit. It is used to jump back from the vmexit handler to main, which > is exactly what setjmp/longjmp is meant for. > >>> It is still somewhat magic: the "while (ret)" is only there to please >>> the compiler >> >> No, it it there to catch vmlaunch/vmresume errors. > > You could change it to "while (0)" and it would still work. That's what > I mean by "only there to please the compiler". > >>> the compiler, and you rely on the compiler not changing %rsp between the >>> vmlaunch and the vmx_return label. Minor nit, you cannot anymore print >> HOST_RSP should be loaded on each guest entry. > > Right, but my point is: without a big asm blob, you don't know the right > value to load. It depends on the generated code. And the big asm blob > limits a lot the "code looks nice" value of this approach. > >>> different error messages for vmlaunch vs. vmresume failure. >> Just because the same variable is used to store error values :) >> Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)" > > Yeah. :) > >>> In the end the choice is between "all in asm" and "small asm trampoline" >>> (which also happens to use setjmp/longjmp, but perhaps Arthur can >>> propagate return codes without using setjmp/longjmp, too). >>> Rewriting the whole guest entry exit path in asm like you suggest bellow will eliminate the magic too. >>> I much prefer one big asm statement than many small asm statement scattered among two or three C lines. >>> >>> It's not many asm statements, it's just a dozen lines: >>> >> I am not talking about overall file, but the how vmx_run() is written: >> asm() >> C code >> asm() >> C code >> ... >> >> I much prefer: >> C code >> >> big asm() blob >> >> C code. > > Me too. But the trampoline version is neither, it is > > static inline void vmlaunch() { asm("vmlaunch") } > static inline void vmresume() { asm("vmresume") } > small asm() blob > C code So this is a style problem on the basis of right code generation indeed. When I write codes of this version, it occurs that the compiler optimized some of my codes and something goes wrong. If we use C style, we need setjmp/longjmp, otherwise we need big asm blob to forbid compiler optimization. I prefer to Paolo indeed as to my writing the two versions. Writing asm in C is sometimes uncomfortable (e.g. %rax and %%rax, and considering the C codes before and after the asm blob). Actually, we can hide setjmp in vmx_on and longjmp in the asm codes after executing exit_handler, thus we just need to call vmx_on to enter VM and return a designed value (e.g. -1) in exit_handler to exit VM. Then any following codes don't need to care about setjmp/longjmp. Arthur > > Paolo > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] PF: Async page fault support on s390
Il 11/07/2013 12:41, Christian Borntraeger ha scritto: > On 11/07/13 11:04, Gleb Natapov wrote: >> On Wed, Jul 10, 2013 at 02:59:55PM +0200, Dominik Dingel wrote: >>> This patch enables async page faults for s390 kvm guests. >>> It provides the userspace API to enable, disable or get the status of this >>> feature. Also it includes the diagnose code, called by the guest to enable >>> async page faults. >>> >>> The async page faults will use an already existing guest interface for this >>> purpose, as described in "CP Programming Services (SC24-6084)". >>> >>> Signed-off-by: Dominik Dingel >> Christian, looks good now? > > Looks good, but I just had a discussion with Dominik about several other > cases > (guest driven reboot, qemu driven reboot, life migration). This patch should > allow all these cases (independent from this patch we need an ioctl to flush > the > list of pending interrupts to do so, but reboot is currently broken in that > regard anyway - patch is currently being looked at) > > We are currently discussion if we should get rid of the APF_STATUS and let > the kernel wait for outstanding page faults before returning from KVM_RUN > or if we go with this patch and let userspace wait for completion. > > Will discuss this with Dominik, Conny and Alex. So lets defer that till next > week, ok? Let us know if we should wait for a v5. :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
If there is a struct page for the requested mapping then it's normal RAM and the mapping is set to "M" bit (coherent, cacheable) otherwise this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- v2: some cleanup and added comment - arch/powerpc/kvm/e500_mmu_host.c | 23 ++- 1 files changed, 18 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..02eb973 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + /* +* RAM is always mappable on e500 systems, so this is identical +* to kvm_is_mmio_pfn(), just without its overhead. +*/ + if (!pfn_valid(pfn)) { + /* Pages not managed by Kernel are treated as I/O, set I + G */ + mas2_attr |= MAS2_I | MAS2_G; #ifdef CONFIG_SMP - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 & MAS2_ATTRIB_MASK; + } else { + /* Kernel managed pages are actually RAM so set "M" */ + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +326,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe->mas2 = (gvaddr & MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe->mas2, pr); + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
Signed-off-by: Bharat Bhushan --- v2: - No change arch/powerpc/kvm/e500.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h index c2e5e98..277cb18 100644 --- a/arch/powerpc/kvm/e500.h +++ b/arch/powerpc/kvm/e500.h @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct kvm_vcpu *vcpu) #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW) #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW) #define MAS2_ATTRIB_MASK \ - (MAS2_X0 | MAS2_X1) + (MAS2_X0 | MAS2_X1 | MAS2_E) #define MAS3_ATTRIB_MASK \ (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \ | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK) -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [ [PATCH]] nVMX: Initialize IA32_FEATURE_CONTROL MSR in reset and migration
On Tue, Jul 16, 2013 at 03:01:58PM +0300, Gleb Natapov wrote: > On Tue, Jul 16, 2013 at 07:56:25PM +0800, Arthur Chunqi Li wrote: > > On Tue, Jul 16, 2013 at 7:42 PM, Gleb Natapov wrote: > > > On Sun, Jul 07, 2013 at 11:13:37PM +0800, Arthur Chunqi Li wrote: > > >> The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs > > >> to clear this MSR when reset vCPU and keep the value of it when > > >> migration. This patch add this feature. > > >> > > > So what happens if we migrate from qemu that does not have this patch > > > to qemu that does? Since msr_ia32_feature_control will not be migrated > > > it will not be set on the destination so destination will not be able to > > > use nested vmx. Since nested vmx is experimental it may be to early for > > > us to care about it though, and nested vmx does not work with migration > > > anyway. > > In my test, if migration doesn't care about msr_ia32_feature_control, > > the value will be set to 0 in the destination VM and this may cause > > some logical confusions, but the VMX running on it may not aware of > > this (if migration nested vmx is supported in the future) because once > > VMX initialized, it will not check this msr any more in normal cases. > > > With vmm_exclusive=0 kvm does vmxon/vmxoff while running. But lest not > worry about nested kvm migration for now. There are much harder problems > to overcome before it will work. > > > This is also a complex problem since we don't know how many states > > like this msr need to be migrated related to nested virt. If there're > > a lot of states need migrating, it is better to reconstruct the > > relevant codes. But now this patch is enough. > > > > Besides, though migration is not supported in nested vmx, we should > > keep the machine state consistent during migration. So this patch is > > also meaningful. I'm assuming that even "qemu-1.6 -machine pc-1.5" is not expected to allow migration to a qemu-1.5 binary. Is that OK for everybody, or should we support backwards migration? Other than that, the patch looks good to me. If migrating from a version that doesn't have the patch, we are just going to get the same behavior we had before. > > > > Arthur > > > > > >> Signed-off-by: Arthur Chunqi Li > > >> --- > > >> target-i386/cpu.h |2 ++ > > >> target-i386/kvm.c |4 > > >> target-i386/machine.c | 22 ++ > > >> 3 files changed, 28 insertions(+) > > >> > > >> diff --git a/target-i386/cpu.h b/target-i386/cpu.h > > >> index 62e3547..a418e17 100644 > > >> --- a/target-i386/cpu.h > > >> +++ b/target-i386/cpu.h > > >> @@ -301,6 +301,7 @@ > > >> #define MSR_IA32_APICBASE_BSP (1<<8) > > >> #define MSR_IA32_APICBASE_ENABLE(1<<11) > > >> #define MSR_IA32_APICBASE_BASE (0xf<<12) > > >> +#define MSR_IA32_FEATURE_CONTROL0x003a > > >> #define MSR_TSC_ADJUST 0x003b > > >> #define MSR_IA32_TSCDEADLINE0x6e0 > > >> > > >> @@ -813,6 +814,7 @@ typedef struct CPUX86State { > > >> > > >> uint64_t mcg_status; > > >> uint64_t msr_ia32_misc_enable; > > >> +uint64_t msr_ia32_feature_control; > > >> > > >> /* exception/interrupt handling */ > > >> int error_code; > > >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c > > >> index 39f4fbb..3cb2161 100644 > > >> --- a/target-i386/kvm.c > > >> +++ b/target-i386/kvm.c > > >> @@ -1122,6 +1122,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level) > > >> if (hyperv_vapic_recommended()) { > > >> kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, > > >> 0); > > >> } > > >> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, > > >> env->msr_ia32_feature_control); > > >> } > > >> if (env->mcg_cap) { > > >> int i; > > >> @@ -1346,6 +1347,7 @@ static int kvm_get_msrs(X86CPU *cpu) > > >> if (has_msr_misc_enable) { > > >> msrs[n++].index = MSR_IA32_MISC_ENABLE; > > >> } > > >> +msrs[n++].index = MSR_IA32_FEATURE_CONTROL; > > >> > > >> if (!env->tsc_valid) { > > >> msrs[n++].index = MSR_IA32_TSC; > > >> @@ -1444,6 +1446,8 @@ static int kvm_get_msrs(X86CPU *cpu) > > >> case MSR_IA32_MISC_ENABLE: > > >> env->msr_ia32_misc_enable = msrs[i].data; > > >> break; > > >> +case MSR_IA32_FEATURE_CONTROL: > > >> +env->msr_ia32_feature_control = msrs[i].data; > > >> default: > > >> if (msrs[i].index >= MSR_MC0_CTL && > > >> msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * > > >> 4) { > > >> diff --git a/target-i386/machine.c b/target-i386/machine.c > > >> index 3659db9..94ca914 100644 > > >> --- a/target-i386/machine.c > > >> +++ b/target-i386/machine.c > > >> @@ -399,6 +399,14 @@ static bool misc_enable_needed(void *opaque) > > >> return env->msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT; > > >> } > > >> > > >> +static bool f
Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
Il 18/07/2013 13:06, Gleb Natapov ha scritto: > On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote: and for a testsuite I'd prefer the latter---which means I'd still favor setjmp/longjmp. Now, here is the long explanation. I must admit that the code looks nice. There are some nits I'd like to see done differently (such as putting vmx_return at the beginning of the while (1), and the vmresume asm at the end), but it is indeed nice. >>> >>> Why do you prefer setjmp/longjmp then? >> >> Because it is still deceiving, and I dislike the deceit more than I like >> the linear code flow. >> > What is deceiving about it? Of course for someone who has no idea how > vmx works the code will not be obvious, but why should we care. For > someone who knows what is deceiving about returning into the same > function guest was launched from by using VMX mechanism The way the code is written is deceiving. If I see asm("vmlaunch; seta %0") while (ret) I expect HOST_RIP to point at the seta or somewhere near, not at a completely different label somewhere else. >> instead of longjmp()? Look again at the setjmp/longjmp version. longjmp is not used to handle vmexit. It is used to jump back from the vmexit handler to main, which is exactly what setjmp/longjmp is meant for. >> It is still somewhat magic: the "while (ret)" is only there to please >> the compiler > > No, it it there to catch vmlaunch/vmresume errors. You could change it to "while (0)" and it would still work. That's what I mean by "only there to please the compiler". >> the compiler, and you rely on the compiler not changing %rsp between the >> vmlaunch and the vmx_return label. Minor nit, you cannot anymore print > HOST_RSP should be loaded on each guest entry. Right, but my point is: without a big asm blob, you don't know the right value to load. It depends on the generated code. And the big asm blob limits a lot the "code looks nice" value of this approach. >> different error messages for vmlaunch vs. vmresume failure. > Just because the same variable is used to store error values :) > Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)" Yeah. :) >> In the end the choice is between "all in asm" and "small asm trampoline" >> (which also happens to use setjmp/longjmp, but perhaps Arthur can >> propagate return codes without using setjmp/longjmp, too). >> >>> Rewriting the whole guest entry exit path in asm like you suggest bellow >>> will eliminate the magic too. >> >>> I much prefer one big asm statement than many small asm statement >>> scattered among two or three C lines. >> >> It's not many asm statements, it's just a dozen lines: >> > I am not talking about overall file, but the how vmx_run() is written: > asm() > C code > asm() > C code > ... > > I much prefer: > C code > > big asm() blob > > C code. Me too. But the trampoline version is neither, it is static inline void vmlaunch() { asm("vmlaunch") } static inline void vmresume() { asm("vmresume") } small asm() blob C code Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote: > > > and for a testsuite I'd prefer the latter---which means I'd still favor > > > setjmp/longjmp. > > > > > > Now, here is the long explanation. > > > > > > I must admit that the code looks nice. There are some nits I'd like to > > > see done differently (such as putting vmx_return at the beginning of the > > > while (1), and the vmresume asm at the end), but it is indeed nice. > > > > Why do you prefer setjmp/longjmp then? > > Because it is still deceiving, and I dislike the deceit more than I like > the linear code flow. > What is deceiving about it? Of course for someone who has no idea how vmx works the code will not be obvious, but why should we care. For someone who knows what is deceiving about returning into the same function guest was launched from by using VMX mechanism instead of longjmp()? > > Agree, I dislike this magic too, but this is fixed by you suggestion > > above about putting vmx_return at the beginning of while(). So the logic > > will looks like that: > > > > asm volatile("vmlaunch;setbe %0\n\t" : "=m"(ret)); > > while(ret) { > > while(!ret) if you use setbe, of course. > Yeah, this not meant to be compilable code :) > >vmx_return: > >SAVE_GPR_C > >eax = exit_handler(); > >switch(eax) { > >} > >LOAD_GPR_C > >asm volatile("vmresume;seta %0\n\t" : "=m"(ret)); > > } > > It is still somewhat magic: the "while (ret)" is only there to please No, it it there to catch vmlaunch/vmresume errors. > the compiler, and you rely on the compiler not changing %rsp between the > vmlaunch and the vmx_return label. Minor nit, you cannot anymore print HOST_RSP should be loaded on each guest entry. > different error messages for vmlaunch vs. vmresume failure. Just because the same variable is used to store error values :) Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)" > > In the end the choice is between "all in asm" and "small asm trampoline" > (which also happens to use setjmp/longjmp, but perhaps Arthur can > propagate return codes without using setjmp/longjmp, too). > > > Rewriting the whole guest entry exit path in asm like you suggest bellow > > will eliminate the magic too. > > > I much prefer one big asm statement than many small asm statement > > scattered among two or three C lines. > > It's not many asm statements, it's just a dozen lines: > I am not talking about overall file, but the how vmx_run() is written: asm() C code asm() C code ... I much prefer: C code big asm() blob C code. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
Il 18/07/2013 09:26, Gleb Natapov ha scritto: > > I had written a long explanation here about why I don't trust the > > compiler to do the right thing, and ideas about how to fix that. But in > > the end the only workable solution is a single assembly blob like vmx.c > > in KVM to do vmlaunch/vmresume, so I'll get right to the point: > > > >* the "similarity with KVM code" and "as little assembly as * > >* possible" goals are mutually exclusive * > > I never said that code should be similar to KVM code or have as little > assembly as possible :) Reread the thread. The only thing I asked for > is to make code flow linear, if it makes code looks more like KVM or > reduce amount of assembly this is just a bonus. Point taken. > > and for a testsuite I'd prefer the latter---which means I'd still favor > > setjmp/longjmp. > > > > Now, here is the long explanation. > > > > I must admit that the code looks nice. There are some nits I'd like to > > see done differently (such as putting vmx_return at the beginning of the > > while (1), and the vmresume asm at the end), but it is indeed nice. > > Why do you prefer setjmp/longjmp then? Because it is still deceiving, and I dislike the deceit more than I like the linear code flow. > Agree, I dislike this magic too, but this is fixed by you suggestion > above about putting vmx_return at the beginning of while(). So the logic > will looks like that: > > asm volatile("vmlaunch;setbe %0\n\t" : "=m"(ret)); > while(ret) { while(!ret) if you use setbe, of course. >vmx_return: >SAVE_GPR_C >eax = exit_handler(); >switch(eax) { >} >LOAD_GPR_C >asm volatile("vmresume;seta %0\n\t" : "=m"(ret)); > } It is still somewhat magic: the "while (ret)" is only there to please the compiler, and you rely on the compiler not changing %rsp between the vmlaunch and the vmx_return label. Minor nit, you cannot anymore print different error messages for vmlaunch vs. vmresume failure. In the end the choice is between "all in asm" and "small asm trampoline" (which also happens to use setjmp/longjmp, but perhaps Arthur can propagate return codes without using setjmp/longjmp, too). > Rewriting the whole guest entry exit path in asm like you suggest bellow > will eliminate the magic too. > I much prefer one big asm statement than many small asm statement > scattered among two or three C lines. It's not many asm statements, it's just a dozen lines: align 4, 0x90 entry_vmx: SAVE_GPR call default_exit_handler /* Should not reach here*/ mov $1, %eax call exit .align 4, 0x90 entry_sysenter: SAVE_GPR and $0xf, %eax mov %eax, %edi calldefault_syscall_handler /* Arthur, is something missing here? :)) */ Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 12:19, “tiejun.chen” wrote: > On 07/18/2013 06:12 PM, Alexander Graf wrote: >> >> On 18.07.2013, at 12:08, “tiejun.chen” wrote: >> >>> On 07/18/2013 05:48 PM, Alexander Graf wrote: On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: Bhushan Bharat-R65777 >> Sent: Thursday, July 18, 2013 1:53 PM >> To: '"�tiejun.chen�"' >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood >> Scott- >> B07421 >> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for >> kernel >> managed pages >> >> >> >>> -Original Message- >>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] >>> Sent: Thursday, July 18, 2013 1:52 PM >>> To: Bhushan Bharat-R65777 >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood >>> Scott- >>> B07421 >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>> kernel managed pages >>> >>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" > Sent: Thursday, July 18, 2013 1:01 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > Wood > Scott- > B07421 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > kernel managed pages > > On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >> >> >>> -Original Message- >>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] >>> Sent: Thursday, July 18, 2013 11:56 AM >>> To: Bhushan Bharat-R65777 >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; >>> Wood >>> Scott- B07421; Bhushan Bharat-R65777 >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only >>> for kernel managed pages >>> >>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int >>> usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { >>> >>> Why not directly use kvm_is_mmio_pfn()? >> >> What I understand from this function (someone can correct me) is >> that it > returns "false" when the page is managed by kernel and is not > marked as RESERVED (for some reason). For us it does not matter > whether the page is reserved or not, if it is kernel visible page > then it >> is DDR. >> > > I think you are setting I|G by addressing all mmio pages, right? If > so, > > KVM: direct mmio pfn check > > Userspace may specify memory slots that are backed by mmio > pages rather than > normal RAM. In some cases it is not enough to identify these > mmio >>> pages > by pfn_valid(). This patch adds checking the PageReserved as > well. Do you know what are those "some cases" and how checking PageReserved helps in >>> those cases? >>> >>> No, myself didn't see these actual cases in qemu,too. But this should >>> be chronically persistent as I understand ;-) >> >> Then I will wait till someone educate me :) > > The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do > not want to call this for all tlbwe operat
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 06:12 PM, Alexander Graf wrote: On 18.07.2013, at 12:08, “tiejun.chen” wrote: On 07/18/2013 05:48 PM, Alexander Graf wrote: On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '"�tiejun.chen�"' Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases: 1) Non cache coherent DMA 2) Memory hot remove The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by: depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON default n if PPC_47x default y so we never hit it with any core we care about ;). Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either. Thanks for this good information :) So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss. If I'm wrong please correct me :) You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 06:00 PM, Alexander Graf wrote: On 18.07.2013, at 11:56, “tiejun.chen” wrote: On 07/18/2013 05:44 PM, Alexander Graf wrote: On 18.07.2013, at 10:55, �tiejun.chen� wrote: On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '"�tiejun.chen�"' Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? Because other devices wouldn't be available to the guest through memory slots. Yes. I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache) Sorry, looks I'm misleading you :-P So maybe we can introduce another helper to fixup that TLB entry in instead of this path. This path does fix up the shadow (host) TLB entry :). I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :) I don't think it's possible to have a separate path without a certain level of trust. In the current flow we don't trust anyone. We just check every translated page whether we should enable caching or not. We could take that information from 2 other side though: 1) Memory Slot 2) Guest TLB Flags If we take it from the memory slot we would hav
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 12:08, “tiejun.chen” wrote: > On 07/18/2013 05:48 PM, Alexander Graf wrote: >> >> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: >> >>> >>> -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '"�tiejun.chen�"' Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages > -Original Message- > From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] > Sent: Thursday, July 18, 2013 1:52 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood > Scott- > B07421 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > kernel managed pages > > On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >> >> >>> -Original Message- >>> From: kvm-ppc-ow...@vger.kernel.org >>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" >>> Sent: Thursday, July 18, 2013 1:01 PM >>> To: Bhushan Bharat-R65777 >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; >>> Wood >>> Scott- >>> B07421 >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>> kernel managed pages >>> >>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > -Original Message- > From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] > Sent: Thursday, July 18, 2013 11:56 AM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > Wood > Scott- B07421; Bhushan Bharat-R65777 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only > for kernel managed pages > > On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >> If there is a struct page for the requested mapping then it's >> normal DDR and the mapping sets "M" bit (coherent, cacheable) >> else this is treated as I/O and we set "I + G" (cache >> inhibited, >> guarded) >> >> This helps setting proper TLB mapping for direct assigned device >> >> Signed-off-by: Bharat Bhushan >> --- >>arch/powerpc/kvm/e500_mmu_host.c | 17 - >>1 files changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >> b/arch/powerpc/kvm/e500_mmu_host.c >> index 1c6a9d7..089c227 100644 >> --- a/arch/powerpc/kvm/e500_mmu_host.c >> +++ b/arch/powerpc/kvm/e500_mmu_host.c >> @@ -64,13 +64,20 @@ static inline u32 >> e500_shadow_mas3_attrib(u32 mas3, int > usermode) >> return mas3; >>} >> >> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >> usermode) >> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) >>{ >> +u32 mas2_attr; >> + >> +mas2_attr = mas2 & MAS2_ATTRIB_MASK; >> + >> +if (!pfn_valid(pfn)) { > > Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it >>> returns "false" when the page is managed by kernel and is not >>> marked as RESERVED (for some reason). For us it does not matter >>> whether the page is reserved or not, if it is kernel visible page then >>> it is DDR. >>> >>> I think you are setting I|G by addressing all mmio pages, right? If >>> so, >>> >>> KVM: direct mmio pfn check >>> >>> Userspace may specify memory slots that are backed by mmio >>> pages rather than >>> normal RAM. In some cases it is not enough to identify these >>> mmio > pages >>> by pfn_valid(). This patch adds checking the PageReserved as well. >> >> Do you know what are those "some cases" and how checking >> PageReserved helps in > those cases? > > No, myself didn't see these actual cases in qemu,too. But this should > be chronically persistent as I understand ;-) Then I will wait till someone educate me :) >>> >>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not >>> want to call this for all tlbwe operation unless it is necessary. >> >> It certainly does more than we need and potentially slows down the fast path >> (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to >> check for pages that are declared reserved on the host. This happens in 2 >> cases: >> >> 1) Non cache coherent DMA >> 2) Memory hot remove >> >> The non coherent DMA case wo
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 05:48 PM, Alexander Graf wrote: On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '"�tiejun.chen�"' Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases: 1) Non cache coherent DMA 2) Memory hot remove The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by: depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON default n if PPC_47x default y so we never hit it with any core we care about ;). Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either. Thanks for this good information :) So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss. If I'm wrong please correct me :) Tiejun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 11:56, “tiejun.chen” wrote: > On 07/18/2013 05:44 PM, Alexander Graf wrote: >> >> On 18.07.2013, at 10:55, �tiejun.chen� wrote: >> >>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: > -Original Message- > From: Bhushan Bharat-R65777 > Sent: Thursday, July 18, 2013 1:53 PM > To: '"�tiejun.chen�"' > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood > Scott- > B07421 > Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > > >> -Original Message- >> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] >> Sent: Thursday, July 18, 2013 1:52 PM >> To: Bhushan Bharat-R65777 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood >> Scott- >> B07421 >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >> kernel managed pages >> >> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>> >>> -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] >> Sent: Thursday, July 18, 2013 11:56 AM >> To: Bhushan Bharat-R65777 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; >> Wood >> Scott- B07421; Bhushan Bharat-R65777 >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only >> for kernel managed pages >> >> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>> If there is a struct page for the requested mapping then it's >>> normal DDR and the mapping sets "M" bit (coherent, cacheable) >>> else this is treated as I/O and we set "I + G" (cache >>> inhibited, >>> guarded) >>> >>> This helps setting proper TLB mapping for direct assigned device >>> >>> Signed-off-by: Bharat Bhushan >>> --- >>> arch/powerpc/kvm/e500_mmu_host.c | 17 - >>> 1 files changed, 12 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>> b/arch/powerpc/kvm/e500_mmu_host.c >>> index 1c6a9d7..089c227 100644 >>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>> @@ -64,13 +64,20 @@ static inline u32 >>> e500_shadow_mas3_attrib(u32 mas3, int >> usermode) >>> return mas3; >>> } >>> >>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >>> usermode) >>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) >>> { >>> + u32 mas2_attr; >>> + >>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK; >>> + >>> + if (!pfn_valid(pfn)) { >> >> Why not directly use kvm_is_mmio_pfn()? > > What I understand from this function (someone can correct me) is > that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it > is DDR. > I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio >> pages by pfn_valid(). This patch adds checking the PageReserved as well. >>> >>> Do you know what are those "some cases" and how checking >>> PageReserved helps in >> those cases? >> >> No, myself didn't see these actual cases in qemu,too. But this should >> be chronically persistent as I understand ;-) > > Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. >>> >>> Furthermore, how to distinguish we're creating TLB entry for the device >>> assigned directly to the GS? >> >> Because other devices wouldn't b
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 05:44 PM, Alexander Graf wrote: On 18.07.2013, at 10:55, �tiejun.chen� wrote: On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '"�tiejun.chen�"' Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? Because other devices wouldn't be available to the guest through memory slots. Yes. I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache) Sorry, looks I'm misleading you :-P So maybe we can introduce another helper to fixup that TLB entry in instead of this path. This path does fix up the shadow (host) TLB entry :). I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :) Tiejun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On > Behalf Of Alexander Graf > Sent: Thursday, July 18, 2013 3:19 PM > To: Bhushan Bharat-R65777 > Cc: "“tiejun.chen”"; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott- > B07421 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > > On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: > > > > > > >> -Original Message- > >> From: Bhushan Bharat-R65777 > >> Sent: Thursday, July 18, 2013 1:53 PM > >> To: '"“tiejun.chen”"' > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood > >> Scott- > >> B07421 > >> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for > >> kernel managed pages > >> > >> > >> > >>> -Original Message- > >>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > >>> Sent: Thursday, July 18, 2013 1:52 PM > >>> To: Bhushan Bharat-R65777 > >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > >>> Wood > >>> Scott- > >>> B07421 > >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > >>> kernel managed pages > >>> > >>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > > > > -Original Message- > > From: kvm-ppc-ow...@vger.kernel.org > > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" > > Sent: Thursday, July 18, 2013 1:01 PM > > To: Bhushan Bharat-R65777 > > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > > Wood > > Scott- > > B07421 > > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only > > for kernel managed pages > > > > On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > >> > >> > >>> -Original Message- > >>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > >>> Sent: Thursday, July 18, 2013 11:56 AM > >>> To: Bhushan Bharat-R65777 > >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > >>> Wood > >>> Scott- B07421; Bhushan Bharat-R65777 > >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only > >>> for kernel managed pages > >>> > >>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > If there is a struct page for the requested mapping then it's > normal DDR and the mapping sets "M" bit (coherent, cacheable) > else this is treated as I/O and we set "I + G" (cache > inhibited, > guarded) > > This helps setting proper TLB mapping for direct assigned > device > > Signed-off-by: Bharat Bhushan > --- > arch/powerpc/kvm/e500_mmu_host.c | 17 - > 1 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index 1c6a9d7..089c227 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -64,13 +64,20 @@ static inline u32 > e500_shadow_mas3_attrib(u32 mas3, int > >>> usermode) > return mas3; > } > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int > usermode) > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > { > +u32 mas2_attr; > + > +mas2_attr = mas2 & MAS2_ATTRIB_MASK; > + > +if (!pfn_valid(pfn)) { > >>> > >>> Why not directly use kvm_is_mmio_pfn()? > >> > >> What I understand from this function (someone can correct me) is > >> that it > > returns "false" when the page is managed by kernel and is not > > marked as RESERVED (for some reason). For us it does not matter > > whether the page is reserved or not, if it is kernel visible page > > then it > >> is DDR. > >> > > > > I think you are setting I|G by addressing all mmio pages, right? > > If so, > > > > KVM: direct mmio pfn check > > > > Userspace may specify memory slots that are backed by mmio > > pages rather than > > normal RAM. In some cases it is not enough to identify these > > mmio > >>> pages > > by pfn_valid(). This patch adds checking the PageReserved as well. > > Do you know what are those "some cases" and how checking > PageReserved helps in > >>> those cases? > >>> > >>> No, myself didn't see these actual cases in qemu,too. But this > >>> should be chronically persistent as I understand ;-) > >> > >> Then I will wait till someone educate me :) > > > > The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not > want to call this for all tlbwe operation unless it is necessary. > > It certainly does more than we need and potentially slows down
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: Bhushan Bharat-R65777 >> Sent: Thursday, July 18, 2013 1:53 PM >> To: '"“tiejun.chen”"' >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- >> B07421 >> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel >> managed pages >> >> >> >>> -Original Message- >>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] >>> Sent: Thursday, July 18, 2013 1:52 PM >>> To: Bhushan Bharat-R65777 >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood >>> Scott- >>> B07421 >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>> kernel managed pages >>> >>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" > Sent: Thursday, July 18, 2013 1:01 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > Wood > Scott- > B07421 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > kernel managed pages > > On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >> >> >>> -Original Message- >>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] >>> Sent: Thursday, July 18, 2013 11:56 AM >>> To: Bhushan Bharat-R65777 >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; >>> Wood >>> Scott- B07421; Bhushan Bharat-R65777 >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only >>> for kernel managed pages >>> >>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int >>> usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { >>> >>> Why not directly use kvm_is_mmio_pfn()? >> >> What I understand from this function (someone can correct me) is >> that it > returns "false" when the page is managed by kernel and is not > marked as RESERVED (for some reason). For us it does not matter > whether the page is reserved or not, if it is kernel visible page then it >> is DDR. >> > > I think you are setting I|G by addressing all mmio pages, right? If > so, > > KVM: direct mmio pfn check > > Userspace may specify memory slots that are backed by mmio > pages rather than > normal RAM. In some cases it is not enough to identify these > mmio >>> pages > by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in >>> those cases? >>> >>> No, myself didn't see these actual cases in qemu,too. But this should >>> be chronically persistent as I understand ;-) >> >> Then I will wait till someone educate me :) > > The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not > want to call this for all tlbwe operation unless it is necessary. It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases: 1) Non cache coherent DMA 2) Memory hot remove The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by: depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON default n if PPC_47x default y so we never hit it with any core we care about ;). Memory hot r
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 18.07.2013, at 10:55, “tiejun.chen” wrote: > On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: >> >> >>> -Original Message- >>> From: Bhushan Bharat-R65777 >>> Sent: Thursday, July 18, 2013 1:53 PM >>> To: '"“tiejun.chen”"' >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- >>> B07421 >>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel >>> managed pages >>> >>> >>> -Original Message- From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: kvm-ppc-ow...@vger.kernel.org >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" >> Sent: Thursday, July 18, 2013 1:01 PM >> To: Bhushan Bharat-R65777 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; >> Wood >> Scott- >> B07421 >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >> kernel managed pages >> >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>> >>> -Original Message- From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > If there is a struct page for the requested mapping then it's > normal DDR and the mapping sets "M" bit (coherent, cacheable) > else this is treated as I/O and we set "I + G" (cache > inhibited, > guarded) > > This helps setting proper TLB mapping for direct assigned device > > Signed-off-by: Bharat Bhushan > --- > arch/powerpc/kvm/e500_mmu_host.c | 17 - > 1 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index 1c6a9d7..089c227 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -64,13 +64,20 @@ static inline u32 > e500_shadow_mas3_attrib(u32 mas3, int usermode) > return mas3; > } > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int > usermode) > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > { > + u32 mas2_attr; > + > + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > + > + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? >>> >>> What I understand from this function (someone can correct me) is >>> that it >> returns "false" when the page is managed by kernel and is not >> marked as RESERVED (for some reason). For us it does not matter >> whether the page is reserved or not, if it is kernel visible page then it >>> is DDR. >>> >> >> I think you are setting I|G by addressing all mmio pages, right? If >> so, >> >> KVM: direct mmio pfn check >> >> Userspace may specify memory slots that are backed by mmio >> pages rather than >> normal RAM. In some cases it is not enough to identify these >> mmio pages >> by pfn_valid(). This patch adds checking the PageReserved as well. > > Do you know what are those "some cases" and how checking > PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) >>> >>> Then I will wait till someone educate me :) >> >> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not >> want to call this for all tlbwe operation unless it is necessary. > > Furthermore, how to distinguish we're creating TLB entry for the device > assigned directly to the GS? Because other devices wouldn't be available to the guest through memory slots. > I think its unnecessary to always check if that is mmio's pfn since we have > more non direct assigned devices. I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to dis
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '"“tiejun.chen”"' Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. So maybe we can introduce another helper to fixup that TLB entry in instead of this path. Tiejun -Bharat + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 & MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe->mas2 = (gvaddr & MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe->mas2, pr); + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord..
RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
> -Original Message- > From: Bhushan Bharat-R65777 > Sent: Thursday, July 18, 2013 1:53 PM > To: '"“tiejun.chen”"' > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- > B07421 > Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > > > > -Original Message- > > From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > > Sent: Thursday, July 18, 2013 1:52 PM > > To: Bhushan Bharat-R65777 > > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood > > Scott- > > B07421 > > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > > kernel managed pages > > > > On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > > > > > > > > >> -Original Message- > > >> From: kvm-ppc-ow...@vger.kernel.org > > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" > > >> Sent: Thursday, July 18, 2013 1:01 PM > > >> To: Bhushan Bharat-R65777 > > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > > >> Wood > > >> Scott- > > >> B07421 > > >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > > >> kernel managed pages > > >> > > >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > > >>> > > >>> > > -Original Message- > > From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > > Sent: Thursday, July 18, 2013 11:56 AM > > To: Bhushan Bharat-R65777 > > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > > Wood > > Scott- B07421; Bhushan Bharat-R65777 > > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only > > for kernel managed pages > > > > On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > > > If there is a struct page for the requested mapping then it's > > > normal DDR and the mapping sets "M" bit (coherent, cacheable) > > > else this is treated as I/O and we set "I + G" (cache > > > inhibited, > > > guarded) > > > > > > This helps setting proper TLB mapping for direct assigned device > > > > > > Signed-off-by: Bharat Bhushan > > > --- > > > arch/powerpc/kvm/e500_mmu_host.c | 17 - > > > 1 files changed, 12 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > > > b/arch/powerpc/kvm/e500_mmu_host.c > > > index 1c6a9d7..089c227 100644 > > > --- a/arch/powerpc/kvm/e500_mmu_host.c > > > +++ b/arch/powerpc/kvm/e500_mmu_host.c > > > @@ -64,13 +64,20 @@ static inline u32 > > > e500_shadow_mas3_attrib(u32 mas3, int > > usermode) > > > return mas3; > > > } > > > > > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int > > > usermode) > > > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > > > { > > > + u32 mas2_attr; > > > + > > > + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > > > + > > > + if (!pfn_valid(pfn)) { > > > > Why not directly use kvm_is_mmio_pfn()? > > >>> > > >>> What I understand from this function (someone can correct me) is > > >>> that it > > >> returns "false" when the page is managed by kernel and is not > > >> marked as RESERVED (for some reason). For us it does not matter > > >> whether the page is reserved or not, if it is kernel visible page then it > is DDR. > > >>> > > >> > > >> I think you are setting I|G by addressing all mmio pages, right? If > > >> so, > > >> > > >> KVM: direct mmio pfn check > > >> > > >> Userspace may specify memory slots that are backed by mmio > > >> pages rather than > > >> normal RAM. In some cases it is not enough to identify these > > >> mmio > > pages > > >> by pfn_valid(). This patch adds checking the PageReserved as well. > > > > > > Do you know what are those "some cases" and how checking > > > PageReserved helps in > > those cases? > > > > No, myself didn't see these actual cases in qemu,too. But this should > > be chronically persistent as I understand ;-) > > Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. -Bharat > > > + mas2_attr |= MAS2_I | MAS2_G; > > > + } else { > > > #ifdef CONFIG_SMP > > > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; > > > -#else > > > - return mas2 & MAS2_ATTRIB_MASK; > > > + mas2_attr |= MAS2_M; > > > #endif > > > + } > > > + return mas2_attr; > > > } > > > > > > /* > > > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( > > > /* Force IPROT=0 for all guest mappings. */ > > > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | > > > MAS1_VALID; > > > stlbe->mas2 = (gvaddr & MAS2_EPN) | > > > -
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 & MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } Additionally, in UP case this little chunk of code is equivalent to if (1) { mas2_attr |= MAS2_I | MAS2_G; } else { } So you'd better wrapper MAS2_m in advance like, #ifdef CONFIG_SMP #define M_IF_SMPMAS2_M #else #define M_IF_SMP0 #endif Then if (1) mas2_attr |= MAS2_I | MAS2_G; else mas2_attr |= M_IF_SMP; Tiejun + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe->mas2 = (gvaddr & MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe->mas2, pr); + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
> -Original Message- > From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > Sent: Thursday, July 18, 2013 1:52 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- > B07421 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > > > > > >> -Original Message- > >> From: kvm-ppc-ow...@vger.kernel.org > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" > >> Sent: Thursday, July 18, 2013 1:01 PM > >> To: Bhushan Bharat-R65777 > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood > >> Scott- > >> B07421 > >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > >> kernel managed pages > >> > >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > >>> > >>> > -Original Message- > From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > Sent: Thursday, July 18, 2013 11:56 AM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; > Wood > Scott- B07421; Bhushan Bharat-R65777 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > kernel managed pages > > On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > > If there is a struct page for the requested mapping then it's > > normal DDR and the mapping sets "M" bit (coherent, cacheable) else > > this is treated as I/O and we set "I + G" (cache inhibited, > > guarded) > > > > This helps setting proper TLB mapping for direct assigned device > > > > Signed-off-by: Bharat Bhushan > > --- > > arch/powerpc/kvm/e500_mmu_host.c | 17 - > > 1 files changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > > b/arch/powerpc/kvm/e500_mmu_host.c > > index 1c6a9d7..089c227 100644 > > --- a/arch/powerpc/kvm/e500_mmu_host.c > > +++ b/arch/powerpc/kvm/e500_mmu_host.c > > @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 > > mas3, int > usermode) > > return mas3; > > } > > > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) > > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > > { > > + u32 mas2_attr; > > + > > + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > > + > > + if (!pfn_valid(pfn)) { > > Why not directly use kvm_is_mmio_pfn()? > >>> > >>> What I understand from this function (someone can correct me) is > >>> that it > >> returns "false" when the page is managed by kernel and is not marked > >> as RESERVED (for some reason). For us it does not matter whether the > >> page is reserved or not, if it is kernel visible page then it is DDR. > >>> > >> > >> I think you are setting I|G by addressing all mmio pages, right? If > >> so, > >> > >> KVM: direct mmio pfn check > >> > >> Userspace may specify memory slots that are backed by mmio > >> pages rather than > >> normal RAM. In some cases it is not enough to identify these mmio > pages > >> by pfn_valid(). This patch adds checking the PageReserved as well. > > > > Do you know what are those "some cases" and how checking PageReserved helps > > in > those cases? > > No, myself didn't see these actual cases in qemu,too. But this should be > chronically persistent as I understand ;-) Then I will wait till someone educate me :) -Bharat > > Tiejun > > > > > -Bharat > > > >> > >> Tiejun > >> > >>> -Bharat > >>> > > Tiejun > > > + mas2_attr |= MAS2_I | MAS2_G; > > + } else { > > #ifdef CONFIG_SMP > > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; > > -#else > > - return mas2 & MAS2_ATTRIB_MASK; > > + mas2_attr |= MAS2_M; > > #endif > > + } > > + return mas2_attr; > > } > > > > /* > > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( > > /* Force IPROT=0 for all guest mappings. */ > > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | > > MAS1_VALID; > > stlbe->mas2 = (gvaddr & MAS2_EPN) | > > - e500_shadow_mas2_attrib(gtlbe->mas2, pr); > > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); > > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | > > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); > > > > > > >>> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > >> the body of a message to majord...@vger.kernel.org More majordomo > >> info at http://vger.kernel.org/majordomo-info.html > > >
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”" Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Tiejun -Bharat Tiejun -Bharat Tiejun + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 & MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe->mas2 = (gvaddr & MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe->mas2, pr); + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On > Behalf Of "“tiejun.chen”" > Sent: Thursday, July 18, 2013 1:01 PM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- > B07421 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > > > > > >> -Original Message- > >> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > >> Sent: Thursday, July 18, 2013 11:56 AM > >> To: Bhushan Bharat-R65777 > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood > >> Scott- B07421; Bhushan Bharat-R65777 > >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for > >> kernel managed pages > >> > >> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > >>> If there is a struct page for the requested mapping then it's normal > >>> DDR and the mapping sets "M" bit (coherent, cacheable) else this is > >>> treated as I/O and we set "I + G" (cache inhibited, guarded) > >>> > >>> This helps setting proper TLB mapping for direct assigned device > >>> > >>> Signed-off-by: Bharat Bhushan > >>> --- > >>>arch/powerpc/kvm/e500_mmu_host.c | 17 - > >>>1 files changed, 12 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c > >>> b/arch/powerpc/kvm/e500_mmu_host.c > >>> index 1c6a9d7..089c227 100644 > >>> --- a/arch/powerpc/kvm/e500_mmu_host.c > >>> +++ b/arch/powerpc/kvm/e500_mmu_host.c > >>> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 > >>> mas3, int > >> usermode) > >>> return mas3; > >>>} > >>> > >>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) > >>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > >>>{ > >>> + u32 mas2_attr; > >>> + > >>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > >>> + > >>> + if (!pfn_valid(pfn)) { > >> > >> Why not directly use kvm_is_mmio_pfn()? > > > > What I understand from this function (someone can correct me) is that it > returns "false" when the page is managed by kernel and is not marked as > RESERVED > (for some reason). For us it does not matter whether the page is reserved or > not, if it is kernel visible page then it is DDR. > > > > I think you are setting I|G by addressing all mmio pages, right? If so, > > KVM: direct mmio pfn check > > Userspace may specify memory slots that are backed by mmio pages rather > than > normal RAM. In some cases it is not enough to identify these mmio pages > by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those "some cases" and how checking PageReserved helps in those cases? -Bharat > > Tiejun > > > -Bharat > > > >> > >> Tiejun > >> > >>> + mas2_attr |= MAS2_I | MAS2_G; > >>> + } else { > >>>#ifdef CONFIG_SMP > >>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; > >>> -#else > >>> - return mas2 & MAS2_ATTRIB_MASK; > >>> + mas2_attr |= MAS2_M; > >>>#endif > >>> + } > >>> + return mas2_attr; > >>>} > >>> > >>>/* > >>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( > >>> /* Force IPROT=0 for all guest mappings. */ > >>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | > >>> MAS1_VALID; > >>> stlbe->mas2 = (gvaddr & MAS2_EPN) | > >>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr); > >>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); > >>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | > >>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); > >>> > >>> > >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body > of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets "M" bit (coherent, cacheable) else this is treated as I/O and we set "I + G" (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 & MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Tiejun -Bharat Tiejun + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 & MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe->mas2 = (gvaddr & MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe->mas2, pr); + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case
On Thu, Jul 18, 2013 at 07:52:21AM +0200, Paolo Bonzini wrote: > Il 17/07/2013 20:54, Arthur Chunqi Li ha scritto: > > + ".globl entry_sysenter\n\t" > > + "entry_sysenter:\n\t" > > + SAVE_GPR > > + " and $0xf, %rax\n\t" > > + " push%rax\n\t" > > push should be wrong here, the first argument is in %rdi. > > > + " callsyscall_handler\n\t" > > + LOAD_GPR > > + " vmresume\n\t" > > +); > > + > > +int exit_handler() > > This (and syscall_handler too) needs __attribute__((__used__)) because > it's only used from assembly. > That was my question actually, why is it used from assembly? Calling it from see should not be a problem. > Please add "static" to all functions except main, it triggers better > compiler optimization and warnings and it will avoid nasty surprises in > the future if the compiler becomes smarter. > > > +{ > > + int ret; > > + > > + current->exits++; > > + current->guest_regs = regs; > > + if (is_hypercall()) > > + ret = handle_hypercall(); > > + else > > + ret = current->exit_handler(); > > + regs = current->guest_regs; > > + switch (ret) { > > + case VMX_TEST_VMEXIT: > > + case VMX_TEST_RESUME: > > + return ret; > > + case VMX_TEST_EXIT: > > + break; > > + default: > > + printf("ERROR : Invalid exit_handler return val %d.\n" > > + , ret); > > + } > > Here have to propagate the exit codes through multiple levels, because > we're not using setjmp/longjmp. Not a big deal. My worries with this > version are below. > > > + print_vmexit_info(); > > + exit(-1); > > + return 0; > > +} > > + > > +int vmx_run() > > +{ > > + bool ret; > > + u32 eax; > > + u64 rsp; > > + > > + asm volatile("mov %%rsp, %0\n\t" : "=r"(rsp)); > > + vmcs_write(HOST_RSP, rsp); > > + asm volatile("vmlaunch;seta %0\n\t" : "=m"(ret)); > > setbe (this path is probably untested because it never happens in practice). > At least one of the tests should set up wrong vmcs state to verify that nested vmx handles it correctly. In fact one of the patches that Arthur have sent to nested vmx fixes exactly that code path. > Also, missing memory clobber. > > > + if (ret) { > > + printf("%s : vmlaunch failed.\n", __func__); > > + return 1; > > + } > > + while (1) { > > + asm volatile( > > + LOAD_GPR_C > > + "vmresume;seta %0\n\t" > > + : "=m"(ret)); > > setbe and missing memory clobber. > > > + if (ret) { > > + printf("%s : vmresume failed.\n", __func__); > > + return 1; > > + } > > + asm volatile( > > + ".global vmx_return\n\t" > > .global should not be needed. > > > + "vmx_return:\n\t" > > + SAVE_GPR_C > > + "call exit_handler\n\t" > > + "mov %%eax, %0\n\t" > > + : "=r"(eax) > > + ); > > I had written a long explanation here about why I don't trust the > compiler to do the right thing, and ideas about how to fix that. But in > the end the only workable solution is a single assembly blob like vmx.c > in KVM to do vmlaunch/vmresume, so I'll get right to the point: > >* the "similarity with KVM code" and "as little assembly as * >* possible" goals are mutually exclusive * > I never said that code should be similar to KVM code or have as little assembly as possible :) Reread the thread. The only thing I asked for is to make code flow linear, if it makes code looks more like KVM or reduce amount of assembly this is just a bonus. > and for a testsuite I'd prefer the latter---which means I'd still favor > setjmp/longjmp. > > Now, here is the long explanation. > > I must admit that the code looks nice. There are some nits I'd like to > see done differently (such as putting vmx_return at the beginning of the > while (1), and the vmresume asm at the end), but it is indeed nice. > Why do you prefer setjmp/longjmp then? > However, it is also very deceiving, because the processor is not doing > what the code says. If I see: > >vmlaunch(); >exit(-1); > > it is clear that magic happens in vmlaunch(). If I see > >asm volatile("vmlaunch;setbe %0\n\t" : "=m"(ret)); >if (ret) { >... >} >asm("vmx_return:") > > it is absolutely not clear that the setbe and "if" are skipped on a > successful vmlaunch. So, at the very least you need a comment before > those "if (ret)" to document the control flow, or you can use a jmp like > this: > >asm volatile goto ("vmlaunch;jmp %0\n\t" > : : : "memory" : vmlaunch_error); >if (0) { >vmlaunch_error: ... >} > > The unconditional jump and "asm goto" make it clear that magic happens. Agree, I dislike this magic too,
RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
> -Original Message- > From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com] > Sent: Thursday, July 18, 2013 11:56 AM > To: Bhushan Bharat-R65777 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott- > B07421; Bhushan Bharat-R65777 > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > > If there is a struct page for the requested mapping then it's normal > > DDR and the mapping sets "M" bit (coherent, cacheable) else this is > > treated as I/O and we set "I + G" (cache inhibited, guarded) > > > > This helps setting proper TLB mapping for direct assigned device > > > > Signed-off-by: Bharat Bhushan > > --- > > arch/powerpc/kvm/e500_mmu_host.c | 17 - > > 1 files changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > > b/arch/powerpc/kvm/e500_mmu_host.c > > index 1c6a9d7..089c227 100644 > > --- a/arch/powerpc/kvm/e500_mmu_host.c > > +++ b/arch/powerpc/kvm/e500_mmu_host.c > > @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int > usermode) > > return mas3; > > } > > > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) > > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) > > { > > + u32 mas2_attr; > > + > > + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > > + > > + if (!pfn_valid(pfn)) { > > Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. -Bharat > > Tiejun > > > + mas2_attr |= MAS2_I | MAS2_G; > > + } else { > > #ifdef CONFIG_SMP > > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M; > > -#else > > - return mas2 & MAS2_ATTRIB_MASK; > > + mas2_attr |= MAS2_M; > > #endif > > + } > > + return mas2_attr; > > } > > > > /* > > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( > > /* Force IPROT=0 for all guest mappings. */ > > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; > > stlbe->mas2 = (gvaddr & MAS2_EPN) | > > - e500_shadow_mas2_attrib(gtlbe->mas2, pr); > > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn); > > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | > > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr); > > > > > N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf