Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 21:57, Gleb Natapov ha scritto:
> On Thu, Jul 18, 2013 at 02:08:51PM +0200, Paolo Bonzini wrote:
>> Il 18/07/2013 13:06, Gleb Natapov ha scritto:
>>> On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote:
>> and for a testsuite I'd prefer the latter---which means I'd still favor
>> setjmp/longjmp.
>>
>> Now, here is the long explanation.
>>
>> I must admit that the code looks nice.  There are some nits I'd like to
>> see done differently (such as putting vmx_return at the beginning of the
>> while (1), and the vmresume asm at the end), but it is indeed nice.
>
> Why do you prefer setjmp/longjmp then?

 Because it is still deceiving, and I dislike the deceit more than I like
 the linear code flow.

>>> What is deceiving about it? Of course for someone who has no idea how
>>> vmx works the code will not be obvious, but why should we care. For
>>> someone who knows what is deceiving about returning into the same
>>> function guest was launched from by using VMX mechanism
>>
>> The way the code is written is deceiving.  If I see
>>
>>   asm("vmlaunch; seta %0")
>>   while (ret)
>>
>> I expect HOST_RIP to point at the seta or somewhere near, not at a
>> completely different label somewhere else.
>>
> Why would you expect that assuming you know what vmlaunch is?

Because this is written in C, and I know trying to fool the compiler is
a losing game.  So my reaction is "okay, HOST_RIP must be set so that
code will not jump around".  If I see

   asm("vmlaunch")
   exit(-1)

the reaction is the opposite: "hmm, anything that jumps around would
have a hard time with the compiler, there must be some assembly
trampoline somewhere; let's check what HOST_RIP is".

 instead of longjmp()?
>>
>> Look again at the setjmp/longjmp version.  longjmp is not used to handle
>> vmexit.  It is used to jump back from the vmexit handler to main, which
>> is exactly what setjmp/longjmp is meant for.
>>
> That's because simple return will not work in that version, this is
> artifact of how vmexit was done.

I think it can be made to work without setjmp/longjmp, but the code
would be ugly.

 the compiler, and you rely on the compiler not changing %rsp between the
 vmlaunch and the vmx_return label.  Minor nit, you cannot anymore print
>>> HOST_RSP should be loaded on each guest entry.
>>
>> Right, but my point is: without a big asm blob, you don't know the right
>> value to load.  It depends on the generated code.  And the big asm blob
>> limits a lot the "code looks nice" value of this approach.
>>
> I said it number of times already, this is not about "code looking nice",
> "code looks like KVM" or use less assembler as possible", this is about
> linear data flow. It is not fun chasing code execution path. Yes, you
> can argue that vmlaunch/vmresume inherently non linear, but there is a
> difference between skipping one instruction and remain in the same
> function and on the same stack, or jump completely to a different
> context.

I don't see anything bad in jumping completely to a different context.
The guest and host are sort of like two coroutines, they hardly have any
connection with the code that called vmlaunch.

> The actually differences in asm instruction between both
> version will not be bigger then a couple of lines (if we will not take
> setjmp/longjmp implementation into account :)),

I was waiting for this parenthetical remark to appear. ;)

> but I do not even see
> why we discuss this argument since minimizing asm instructions here is
> not an point. We should not use more then needed to achieve the goal of
> course, but design should not be around number of assembly lines.

I agree, I only mentioned it because you talked about

   asm
   C
   asm
   C

and this is not what the setjmp/longjmp code looks like---using inlines
for asm as if they were builtin functions is good, interspersing asm and
C in the same function is bad.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10 v6] KVM: PPC: IOMMU in-kernel handling

2013-07-18 Thread Alexey Kardashevskiy
On 07/16/2013 10:53 AM, Alexey Kardashevskiy wrote:
> The changes are:
> 1. rebased on v3.11-rc1 so the capability numbers changed again
> 2. fixed multiple comments from maintainers
> 3. "KVM: PPC: Add support for IOMMU in-kernel handling" is split into
> 2 patches, the new one is "powerpc/iommu: rework to support realmode".
> 4. IOMMU_API is now always enabled for KVM_BOOK3S_64.
> 
> MOre details in the individual patch comments.
> 
> Depends on "hashtable: add hash_for_each_possible_rcu_notrace()",
> posted a while ago.
> 
> 
> Alexey Kardashevskiy (10):
>   KVM: PPC: reserve a capability number for multitce support
>   KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO



Alex, could you please pull these 2 patches or tell what is wrong with
them? Having them sooner in the kernel would let me ask for a headers
update for QEMU and then I would try pushing miltiple TCE and VFIO support
in QEMU. Thanks.




-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5

2013-07-18 Thread Andi Kleen
From: Andi Kleen 

[KVM maintainers:
The underlying support for this is in perf/core now. So please merge
this patch into the KVM tree.]

This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest

v2: Various fixes to address review feedback
v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits.
v4: Use reserved bits for #GP
v5: Remove obsolete argument
Acked-by: Gleb Natapov 
Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.c  | 25 -
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af9c552..01493a1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,6 +315,7 @@ struct kvm_pmu {
u64 global_ovf_ctrl;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
+   u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c53e797..5c4f631 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc)
 
 static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
unsigned config, bool exclude_user, bool exclude_kernel,
-   bool intr)
+   bool intr, bool in_tx, bool in_tx_cp)
 {
struct perf_event *event;
struct perf_event_attr attr = {
@@ -173,6 +173,10 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 
type,
.exclude_kernel = exclude_kernel,
.config = config,
};
+   if (in_tx)
+   attr.config |= HSW_IN_TX;
+   if (in_tx_cp)
+   attr.config |= HSW_IN_TX_CHECKPOINTED;
 
attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
 
@@ -226,7 +230,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
 
if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
ARCH_PERFMON_EVENTSEL_INV |
-   ARCH_PERFMON_EVENTSEL_CMASK))) {
+   ARCH_PERFMON_EVENTSEL_CMASK |
+   HSW_IN_TX |
+   HSW_IN_TX_CHECKPOINTED))) {
config = find_arch_event(&pmc->vcpu->arch.pmu, event_select,
unit_mask);
if (config != PERF_COUNT_HW_MAX)
@@ -239,7 +245,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
reprogram_counter(pmc, type, config,
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-   eventsel & ARCH_PERFMON_EVENTSEL_INT);
+   eventsel & ARCH_PERFMON_EVENTSEL_INT,
+   (eventsel & HSW_IN_TX),
+   (eventsel & HSW_IN_TX_CHECKPOINTED));
 }
 
 static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
@@ -256,7 +264,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 
en_pmi, int idx)
arch_events[fixed_pmc_events[idx]].event_type,
!(en & 0x2), /* exclude user */
!(en & 0x1), /* exclude kernel */
-   pmi);
+   pmi, false, false);
 }
 
 static inline u8 fixed_en_pmi(u64 ctrl, int idx)
@@ -408,7 +416,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data 
*msr_info)
} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
if (data == pmc->eventsel)
return 0;
-   if (!(data & 0x0020ull)) {
+   if (!(data & pmu->reserved_bits)) {
reprogram_gp_counter(pmc, data);
return 0;
}
@@ -450,6 +458,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
pmu->counter_bitmask[KVM_PMC_GP] = 0;
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
pmu->version = 0;
+   pmu->reserved_bits = 0x0020ull;
 
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
@@ -478,6 +487,12 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << 
INTEL_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
+
+   entry = kvm_find_cpuid_entry(vcpu, 7, 0);
+   if (entry &&
+   (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
+   (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
+   pmu->reserved_bits ^= HSW_IN_TX|H

Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Gleb Natapov
On Thu, Jul 18, 2013 at 02:08:51PM +0200, Paolo Bonzini wrote:
> Il 18/07/2013 13:06, Gleb Natapov ha scritto:
> > On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote:
>  and for a testsuite I'd prefer the latter---which means I'd still favor
>  setjmp/longjmp.
> 
>  Now, here is the long explanation.
> 
>  I must admit that the code looks nice.  There are some nits I'd like to
>  see done differently (such as putting vmx_return at the beginning of the
>  while (1), and the vmresume asm at the end), but it is indeed nice.
> >>>
> >>> Why do you prefer setjmp/longjmp then?
> >>
> >> Because it is still deceiving, and I dislike the deceit more than I like
> >> the linear code flow.
> >>
> > What is deceiving about it? Of course for someone who has no idea how
> > vmx works the code will not be obvious, but why should we care. For
> > someone who knows what is deceiving about returning into the same
> > function guest was launched from by using VMX mechanism
> 
> The way the code is written is deceiving.  If I see
> 
>   asm("vmlaunch; seta %0")
>   while (ret)
> 
> I expect HOST_RIP to point at the seta or somewhere near, not at a
> completely different label somewhere else.
> 
Why would you expect that assuming you know what vmlaunch is? So it is OK
when HOST_RIP point to somewhere outside a function, but "deceiving" if
it point 3 lines down in the same function? I have a hard time following
this logic.

> >> instead of longjmp()?
> 
> Look again at the setjmp/longjmp version.  longjmp is not used to handle
> vmexit.  It is used to jump back from the vmexit handler to main, which
> is exactly what setjmp/longjmp is meant for.
> 
That's because simple return will not work in that version, this is
artifact of how vmexit was done.

> >> It is still somewhat magic: the "while (ret)" is only there to please
> >> the compiler
> >
> > No, it it there to catch vmlaunch/vmresume errors.
> 
> You could change it to "while (0)" and it would still work.  That's what
> I mean by "only there to please the compiler".
But while (1) will not, so the code is executed, it is not just for
compiler there, but it is executed only if vmlaunch/vmresume fails.

> 
> >> the compiler, and you rely on the compiler not changing %rsp between the
> >> vmlaunch and the vmx_return label.  Minor nit, you cannot anymore print
> > HOST_RSP should be loaded on each guest entry.
> 
> Right, but my point is: without a big asm blob, you don't know the right
> value to load.  It depends on the generated code.  And the big asm blob
> limits a lot the "code looks nice" value of this approach.
> 
I said it number of times already, this is not about "code looking nice",
"code looks like KVM" or use less assembler as possible", this is about
linear data flow. It is not fun chasing code execution path. Yes, you
can argue that vmlaunch/vmresume inherently non linear, but there is a
difference between skipping one instruction and remain in the same
function and on the same stack, or jump completely to a different
context.

Speaking about KVM. Guest enter/exit assembly blob is around ~50 lines
if assembly code and more then half of that is saving restoring context.
And the code plays some tricks there for optimization that we do not
need to do here, so I expect the code to be even smaller, not much more
then 10 lines of assembly excluding state save/restore.

> >> different error messages for vmlaunch vs. vmresume failure.
> > Just because the same variable is used to store error values :)
> > Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)"
> 
> Yeah. :)
> 
> >> In the end the choice is between "all in asm" and "small asm trampoline"
> >> (which also happens to use setjmp/longjmp, but perhaps Arthur can
> >> propagate return codes without using setjmp/longjmp, too).
> >>
> >>> Rewriting the whole guest entry exit path in asm like you suggest bellow
> >>> will eliminate the magic too.
> >>
> >>> I much prefer one big asm statement than many small asm statement
> >>> scattered among two or three C lines.
> >>
> >> It's not many asm statements, it's just a dozen lines:
> >>
> > I am not talking about overall file, but the how vmx_run() is written:
> > asm()
> > C code
> > asm()
> > C code
> > ...
> > 
> > I much prefer:
> > C code
> > 
> > big asm() blob
> > 
> > C code.
> 
> Me too.  But the trampoline version is neither, it is
> 
> static inline void vmlaunch() { asm("vmlaunch") }
> static inline void vmresume() { asm("vmresume") }
> small asm() blob
I is small only because context save restore is hidden behind
macro and 4 asm instructions (vmlaunch/seta/vmresume/seta) a hidden
somewhere else. The actually differences in asm instruction between both
version will not be bigger then a couple of lines (if we will not take
setjmp/longjmp implementation into account :)), but I do not even see
why we discuss this argument since minimizing asm instructions here is
not an point. We should not use mor

Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Scott Wood

On 07/18/2013 12:32:18 PM, Alexander Graf wrote:


On 18.07.2013, at 19:17, Scott Wood wrote:

> On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote:
> Likewise, we want to make sure this matches the host entry.   
Unfortunately, this is a bit of a mess already.  64-bit booke appears  
to always set MAS2_M for TLB0 mappings.  The initial KERNELBASE  
mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC)  
replaces it uses _PAGE_COHERENT.  32-bit always uses _PAGE_COHERENT,  
except that initial KERNELBASE mapping.  _PAGE_COHERENT appears to be  
set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config  
clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case).

>
> As for what we actually want to happen, there are cases when we  
want M to be set for non-SMP.  One such case is AMP, where CPUs may  
be sharing memory even if the Linux instance only runs on one CPU  
(this is not hypothetical, BTW).  It's also possible that we  
encounter a hardware bug that requires MAS2_M, similar to what some  
of our non-booke chips require.


How about we always set M then for RAM?


M is like I in that bad things happen if you mix them.  So we really  
want to match exactly what the rest of the kernel is doing.


Plus, the performance penalty on some single-core chips can be pretty  
bad.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 19:17, Scott Wood wrote:

> On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote:
>> If there is a struct page for the requested mapping then it's
>> normal RAM and the mapping is set to "M" bit (coherent, cacheable)
>> otherwise this is treated as I/O and we set  "I + G"  (cache inhibited, 
>> guarded)
>> This helps setting proper TLB mapping for direct assigned device
>> Signed-off-by: Bharat Bhushan 
>> ---
>> v2: some cleanup and added comment
>> -
>> arch/powerpc/kvm/e500_mmu_host.c |   23 ++-
>> 1 files changed, 18 insertions(+), 5 deletions(-)
>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
>> b/arch/powerpc/kvm/e500_mmu_host.c
>> index 1c6a9d7..02eb973 100644
>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>> @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
>> usermode)
>>  return mas3;
>> }
>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>> {
>> +u32 mas2_attr;
>> +
>> +mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>> +
>> +/*
>> + * RAM is always mappable on e500 systems, so this is identical
>> + * to kvm_is_mmio_pfn(), just without its overhead.
>> + */
>> +if (!pfn_valid(pfn)) {
> 
> Please use page_is_ram(), which is what gets used when setting the WIMG for 
> the host userspace mapping.  We want to make sure the two are consistent.
> 
>> +/* Pages not managed by Kernel are treated as I/O, set I + G */
>> +mas2_attr |= MAS2_I | MAS2_G;
>> #ifdef CONFIG_SMP
>> -return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
>> -#else
>> -return mas2 & MAS2_ATTRIB_MASK;
>> +} else {
>> +/* Kernel managed pages are actually RAM so set  "M" */
>> +mas2_attr |= MAS2_M;
>> #endif
> 
> Likewise, we want to make sure this matches the host entry.  Unfortunately, 
> this is a bit of a mess already.  64-bit booke appears to always set MAS2_M 
> for TLB0 mappings.  The initial KERNELBASE mapping on boot uses M_IF_SMP, and 
> the settlbcam() that (IIRC) replaces it uses _PAGE_COHERENT.  32-bit always 
> uses _PAGE_COHERENT, except that initial KERNELBASE mapping.  _PAGE_COHERENT 
> appears to be set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter 
> config clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case).
> 
> As for what we actually want to happen, there are cases when we want M to be 
> set for non-SMP.  One such case is AMP, where CPUs may be sharing memory even 
> if the Linux instance only runs on one CPU (this is not hypothetical, BTW).  
> It's also possible that we encounter a hardware bug that requires MAS2_M, 
> similar to what some of our non-booke chips require.

How about we always set M then for RAM?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Scott Wood

On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal RAM and the mapping is set to "M" bit (coherent, cacheable)
otherwise this is treated as I/O and we set  "I + G"  (cache  
inhibited, guarded)


This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
v2: some cleanup and added comment
 -
 arch/powerpc/kvm/e500_mmu_host.c |   23 ++-
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c  
b/arch/powerpc/kvm/e500_mmu_host.c

index 1c6a9d7..02eb973 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32  
mas3, int usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   /*
+* RAM is always mappable on e500 systems, so this is identical
+* to kvm_is_mmio_pfn(), just without its overhead.
+*/
+   if (!pfn_valid(pfn)) {


Please use page_is_ram(), which is what gets used when setting the WIMG  
for the host userspace mapping.  We want to make sure the two are  
consistent.


+		/* Pages not managed by Kernel are treated as I/O, set  
I + G */

+   mas2_attr |= MAS2_I | MAS2_G;
 #ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
+   } else {
+   /* Kernel managed pages are actually RAM so set  "M" */
+   mas2_attr |= MAS2_M;
 #endif


Likewise, we want to make sure this matches the host entry.   
Unfortunately, this is a bit of a mess already.  64-bit booke appears  
to always set MAS2_M for TLB0 mappings.  The initial KERNELBASE mapping  
on boot uses M_IF_SMP, and the settlbcam() that (IIRC) replaces it uses  
_PAGE_COHERENT.  32-bit always uses _PAGE_COHERENT, except that initial  
KERNELBASE mapping.  _PAGE_COHERENT appears to be set based on  
CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config clears  
_PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case).


As for what we actually want to happen, there are cases when we want M  
to be set for non-SMP.  One such case is AMP, where CPUs may be sharing  
memory even if the Linux instance only runs on one CPU (this is not  
hypothetical, BTW).  It's also possible that we encounter a hardware  
bug that requires MAS2_M, similar to what some of our non-booke chips  
require.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2

2013-07-18 Thread Scott Wood

On 07/18/2013 10:12:30 AM, Bhushan Bharat-R65777 wrote:



> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org  
[mailto:kvm-ppc-ow...@vger.kernel.org] On

> Behalf Of Alexander Graf
> Sent: Thursday, July 18, 2013 8:18 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood  
Scott-B07421; Bhushan

> Bharat-R65777
> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E"  
attribute in mas2

>
>
> This needs a description. Why shouldn't we ignore E?

What I understood is that "there is no reason to stop guest setting  
"E", so allow him."


FWIW, it'd probably be safe to let the guest control the G bit as well.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Scott Wood

On 07/18/2013 05:00:42 AM, Alexander Graf wrote:
Now why is setting invalid flags a problem? If I understand Scott  
correctly, it can break the host if you access certain host devices  
with caching enabled. But to be sure I'd say we ask him directly :).


The architecture makes it illegal to mix cacheable and cache-inhibited  
mappings to the same physical page.  Mixing W or M bits is generally  
bad as well.  I've seen it cause machine checks, error interrupts, etc.  
-- not just corrupting the page in question.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 18, 2013 8:50 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421
> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
> 
> 
> On 18.07.2013, at 17:12, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: kvm-ppc-ow...@vger.kernel.org
> >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf
> >> Sent: Thursday, July 18, 2013 8:18 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421;
> >> Bhushan
> >> Bharat-R65777
> >> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute
> >> in mas2
> >>
> >>
> >> This needs a description. Why shouldn't we ignore E?
> >
> > What I understood is that "there is no reason to stop guest setting "E", so
> allow him."
> 
> Please add that to the patch description. Also explain what the bit means.

Ok :)

-Bharat
> 
> 
> Alex
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 17:12, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
>> Behalf Of Alexander Graf
>> Sent: Thursday, July 18, 2013 8:18 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan
>> Bharat-R65777
>> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
>> 
>> 
>> This needs a description. Why shouldn't we ignore E?
> 
> What I understood is that "there is no reason to stop guest setting "E", so 
> allow him."

Please add that to the patch description. Also explain what the bit means.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 17:15, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
>> Behalf Of Alexander Graf
>> Sent: Thursday, July 18, 2013 8:23 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan
>> Bharat-R65777
>> Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>> 
>> 
>> On 18.07.2013, at 15:19, Bharat Bhushan wrote:
>> 
>>> If there is a struct page for the requested mapping then it's normal
>>> RAM and the mapping is set to "M" bit (coherent, cacheable) otherwise
>>> this is treated as I/O and we set  "I + G"  (cache inhibited, guarded)
>>> 
>>> This helps setting proper TLB mapping for direct assigned device
>>> 
>>> Signed-off-by: Bharat Bhushan 
>>> ---
>>> v2: some cleanup and added comment
>>> -
>>> arch/powerpc/kvm/e500_mmu_host.c |   23 ++-
>>> 1 files changed, 18 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>> index 1c6a9d7..02eb973 100644
>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>> @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
>> usermode)
>>> return mas3;
>>> }
>>> 
>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>> {
>>> +   u32 mas2_attr;
>>> +
>>> +   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>> +
>>> +   /*
>>> +* RAM is always mappable on e500 systems, so this is identical
>>> +* to kvm_is_mmio_pfn(), just without its overhead.
>>> +*/
>>> +   if (!pfn_valid(pfn)) {
>>> +   /* Pages not managed by Kernel are treated as I/O, set I + G */
>> 
>> Please also document the intermediate thought that I/O should be mapped non-
>> cached.
> 
> I did not get what you mean to document?

Page snot managed by the kernel are treated as I/O. Map it
with disabled cache.

> 
>> 
>>> +   mas2_attr |= MAS2_I | MAS2_G;
>>> #ifdef CONFIG_SMP
>> 
>> Please separate the SMP case out of the branch.
> 
> Really :) this was looking simple to me.

Two branches intertwined never look simple :).

> 
>> 
>>> -   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
>>> -#else
>>> -   return mas2 & MAS2_ATTRIB_MASK;
>>> +   } else {
>>> +   /* Kernel managed pages are actually RAM so set  "M" */
>> 
>> This comment doesn't tell me why M can be set ;).
> 
> RAM in SMP, so setting coherent, is not that obvious?

No.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
> Behalf Of Alexander Graf
> Sent: Thursday, July 18, 2013 8:23 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan
> Bharat-R65777
> Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> 
> On 18.07.2013, at 15:19, Bharat Bhushan wrote:
> 
> > If there is a struct page for the requested mapping then it's normal
> > RAM and the mapping is set to "M" bit (coherent, cacheable) otherwise
> > this is treated as I/O and we set  "I + G"  (cache inhibited, guarded)
> >
> > This helps setting proper TLB mapping for direct assigned device
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v2: some cleanup and added comment
> > -
> > arch/powerpc/kvm/e500_mmu_host.c |   23 ++-
> > 1 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> > b/arch/powerpc/kvm/e500_mmu_host.c
> > index 1c6a9d7..02eb973 100644
> > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
> usermode)
> > return mas3;
> > }
> >
> > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> > {
> > +   u32 mas2_attr;
> > +
> > +   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> > +
> > +   /*
> > +* RAM is always mappable on e500 systems, so this is identical
> > +* to kvm_is_mmio_pfn(), just without its overhead.
> > +*/
> > +   if (!pfn_valid(pfn)) {
> > +   /* Pages not managed by Kernel are treated as I/O, set I + G */
> 
> Please also document the intermediate thought that I/O should be mapped non-
> cached.

I did not get what you mean to document?

> 
> > +   mas2_attr |= MAS2_I | MAS2_G;
> > #ifdef CONFIG_SMP
> 
> Please separate the SMP case out of the branch.

Really :) this was looking simple to me.

> 
> > -   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> > -#else
> > -   return mas2 & MAS2_ATTRIB_MASK;
> > +   } else {
> > +   /* Kernel managed pages are actually RAM so set  "M" */
> 
> This comment doesn't tell me why M can be set ;).

RAM in SMP, so setting coherent, is not that obvious?

-Bharat

> 
> 
> Alex
> 
> > +   mas2_attr |= MAS2_M;
> > #endif
> > +   }
> > +   return mas2_attr;
> > }
> >
> > /*
> > @@ -313,7 +326,7 @@ static void kvmppc_e500_setup_stlbe(
> > /* Force IPROT=0 for all guest mappings. */
> > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> > stlbe->mas2 = (gvaddr & MAS2_EPN) |
> > - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >
> > --
> > 1.7.0.4
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> > the body of a message to majord...@vger.kernel.org More majordomo info
> > at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
> Behalf Of Alexander Graf
> Sent: Thursday, July 18, 2013 8:18 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; Bhushan
> Bharat-R65777
> Subject: Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2
> 
> 
> This needs a description. Why shouldn't we ignore E?

What I understood is that "there is no reason to stop guest setting "E", so 
allow him."

-Bharat

> 
> 
> Alex
> 
> On 18.07.2013, at 15:19, Bharat Bhushan wrote:
> 
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v2:
> > - No change
> > arch/powerpc/kvm/e500.h |2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h index
> > c2e5e98..277cb18 100644
> > --- a/arch/powerpc/kvm/e500.h
> > +++ b/arch/powerpc/kvm/e500.h
> > @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500
> > *to_e500(struct kvm_vcpu *vcpu) #define E500_TLB_USER_PERM_MASK
> > (MAS3_UX|MAS3_UR|MAS3_UW) #define E500_TLB_SUPER_PERM_MASK
> > (MAS3_SX|MAS3_SR|MAS3_SW) #define MAS2_ATTRIB_MASK \
> > - (MAS2_X0 | MAS2_X1)
> > + (MAS2_X0 | MAS2_X1 | MAS2_E)
> > #define MAS3_ATTRIB_MASK \
> >   (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
> >| E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
> > --
> > 1.7.0.4
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 15:19, Bharat Bhushan wrote:

> If there is a struct page for the requested mapping then it's
> normal RAM and the mapping is set to "M" bit (coherent, cacheable)
> otherwise this is treated as I/O and we set  "I + G"  (cache inhibited, 
> guarded)
> 
> This helps setting proper TLB mapping for direct assigned device
> 
> Signed-off-by: Bharat Bhushan 
> ---
> v2: some cleanup and added comment
> - 
> arch/powerpc/kvm/e500_mmu_host.c |   23 ++-
> 1 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c 
> b/arch/powerpc/kvm/e500_mmu_host.c
> index 1c6a9d7..02eb973 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
> usermode)
>   return mas3;
> }
> 
> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> {
> + u32 mas2_attr;
> +
> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> +
> + /*
> +  * RAM is always mappable on e500 systems, so this is identical
> +  * to kvm_is_mmio_pfn(), just without its overhead.
> +  */
> + if (!pfn_valid(pfn)) {
> + /* Pages not managed by Kernel are treated as I/O, set I + G */

Please also document the intermediate thought that I/O should be mapped 
non-cached.

> + mas2_attr |= MAS2_I | MAS2_G;
> #ifdef CONFIG_SMP

Please separate the SMP case out of the branch.

> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> -#else
> - return mas2 & MAS2_ATTRIB_MASK;
> + } else {
> + /* Kernel managed pages are actually RAM so set  "M" */

This comment doesn't tell me why M can be set ;).


Alex

> + mas2_attr |= MAS2_M;
> #endif
> + }
> + return mas2_attr;
> }
> 
> /*
> @@ -313,7 +326,7 @@ static void kvmppc_e500_setup_stlbe(
>   /* Force IPROT=0 for all guest mappings. */
>   stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
>   stlbe->mas2 = (gvaddr & MAS2_EPN) |
> -   e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> +   e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
>   stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
>   e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> 
> -- 
> 1.7.0.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2

2013-07-18 Thread Alexander Graf

This needs a description. Why shouldn't we ignore E?


Alex

On 18.07.2013, at 15:19, Bharat Bhushan wrote:

> Signed-off-by: Bharat Bhushan 
> ---
> v2:
> - No change 
> arch/powerpc/kvm/e500.h |2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index c2e5e98..277cb18 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct 
> kvm_vcpu *vcpu)
> #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
> #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
> #define MAS2_ATTRIB_MASK \
> -   (MAS2_X0 | MAS2_X1)
> +   (MAS2_X0 | MAS2_X1 | MAS2_E)
> #define MAS3_ATTRIB_MASK \
> (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
>  | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
> -- 
> 1.7.0.4
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] PF: Async page fault support on s390

2013-07-18 Thread Christian Borntraeger
On 18/07/13 15:57, Paolo Bonzini wrote:
> Il 11/07/2013 12:41, Christian Borntraeger ha scritto:
>> On 11/07/13 11:04, Gleb Natapov wrote:
>>> On Wed, Jul 10, 2013 at 02:59:55PM +0200, Dominik Dingel wrote:
 This patch enables async page faults for s390 kvm guests.
 It provides the userspace API to enable, disable or get the status of this
 feature. Also it includes the diagnose code, called by the guest to enable
 async page faults.

 The async page faults will use an already existing guest interface for this
 purpose, as described in "CP Programming Services (SC24-6084)".

 Signed-off-by: Dominik Dingel 
>>> Christian, looks good now?
>>
>> Looks good, but I just had a  discussion with Dominik about several other 
>> cases 
>> (guest driven reboot, qemu driven reboot, life migration). This patch should 
>> allow all these cases (independent from this patch we need an ioctl to flush 
>> the
>> list of pending interrupts to do so, but reboot is currently broken in that
>> regard anyway - patch is currently being looked at)
>>
>> We are currently discussion if we should get rid of the APF_STATUS and let 
>> the kernel wait for outstanding page faults before returning from KVM_RUN
>> or if we go with this patch and let userspace wait for completion. 
>>
>> Will discuss this with Dominik, Conny and Alex. So lets defer that till next
>> week, ok?
> 
> Let us know if we should wait for a v5. :)

Yes, there will be a v5


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Arthur Chunqi Li
On Thu, Jul 18, 2013 at 8:08 PM, Paolo Bonzini  wrote:
> Il 18/07/2013 13:06, Gleb Natapov ha scritto:
>> On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote:
> and for a testsuite I'd prefer the latter---which means I'd still favor
> setjmp/longjmp.
>
> Now, here is the long explanation.
>
> I must admit that the code looks nice.  There are some nits I'd like to
> see done differently (such as putting vmx_return at the beginning of the
> while (1), and the vmresume asm at the end), but it is indeed nice.

 Why do you prefer setjmp/longjmp then?
>>>
>>> Because it is still deceiving, and I dislike the deceit more than I like
>>> the linear code flow.
>>>
>> What is deceiving about it? Of course for someone who has no idea how
>> vmx works the code will not be obvious, but why should we care. For
>> someone who knows what is deceiving about returning into the same
>> function guest was launched from by using VMX mechanism
>
> The way the code is written is deceiving.  If I see
>
>   asm("vmlaunch; seta %0")
>   while (ret)
>
> I expect HOST_RIP to point at the seta or somewhere near, not at a
> completely different label somewhere else.
Here for me, I prefer a separate asm blob of entry_vmx instead of one
"hidden" in a C function, which may make it much harder to trace for
someone not familiar with vmx in KVM.
>
>>> instead of longjmp()?
>
> Look again at the setjmp/longjmp version.  longjmp is not used to handle
> vmexit.  It is used to jump back from the vmexit handler to main, which
> is exactly what setjmp/longjmp is meant for.
>
>>> It is still somewhat magic: the "while (ret)" is only there to please
>>> the compiler
>>
>> No, it it there to catch vmlaunch/vmresume errors.
>
> You could change it to "while (0)" and it would still work.  That's what
> I mean by "only there to please the compiler".
>
>>> the compiler, and you rely on the compiler not changing %rsp between the
>>> vmlaunch and the vmx_return label.  Minor nit, you cannot anymore print
>> HOST_RSP should be loaded on each guest entry.
>
> Right, but my point is: without a big asm blob, you don't know the right
> value to load.  It depends on the generated code.  And the big asm blob
> limits a lot the "code looks nice" value of this approach.
>
>>> different error messages for vmlaunch vs. vmresume failure.
>> Just because the same variable is used to store error values :)
>> Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)"
>
> Yeah. :)
>
>>> In the end the choice is between "all in asm" and "small asm trampoline"
>>> (which also happens to use setjmp/longjmp, but perhaps Arthur can
>>> propagate return codes without using setjmp/longjmp, too).
>>>
 Rewriting the whole guest entry exit path in asm like you suggest bellow
 will eliminate the magic too.
>>>
 I much prefer one big asm statement than many small asm statement
 scattered among two or three C lines.
>>>
>>> It's not many asm statements, it's just a dozen lines:
>>>
>> I am not talking about overall file, but the how vmx_run() is written:
>> asm()
>> C code
>> asm()
>> C code
>> ...
>>
>> I much prefer:
>> C code
>>
>> big asm() blob
>>
>> C code.
>
> Me too.  But the trampoline version is neither, it is
>
> static inline void vmlaunch() { asm("vmlaunch") }
> static inline void vmresume() { asm("vmresume") }
> small asm() blob
> C code
So this is a style problem on the basis of right code generation
indeed. When I write codes of this version, it occurs that the
compiler optimized some of my codes and something goes wrong. If we
use C style, we need setjmp/longjmp, otherwise we need big asm blob to
forbid compiler optimization.

I prefer to Paolo indeed as to my writing the two versions. Writing
asm in C is sometimes uncomfortable (e.g. %rax and %%rax, and
considering the C codes before and after the asm blob). Actually, we
can hide setjmp in vmx_on and longjmp in the asm codes after executing
exit_handler, thus we just need to call vmx_on to enter VM and return
a designed value (e.g. -1) in exit_handler to exit VM. Then any
following codes don't need to care about setjmp/longjmp.

Arthur
>
> Paolo
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] PF: Async page fault support on s390

2013-07-18 Thread Paolo Bonzini
Il 11/07/2013 12:41, Christian Borntraeger ha scritto:
> On 11/07/13 11:04, Gleb Natapov wrote:
>> On Wed, Jul 10, 2013 at 02:59:55PM +0200, Dominik Dingel wrote:
>>> This patch enables async page faults for s390 kvm guests.
>>> It provides the userspace API to enable, disable or get the status of this
>>> feature. Also it includes the diagnose code, called by the guest to enable
>>> async page faults.
>>>
>>> The async page faults will use an already existing guest interface for this
>>> purpose, as described in "CP Programming Services (SC24-6084)".
>>>
>>> Signed-off-by: Dominik Dingel 
>> Christian, looks good now?
> 
> Looks good, but I just had a  discussion with Dominik about several other 
> cases 
> (guest driven reboot, qemu driven reboot, life migration). This patch should 
> allow all these cases (independent from this patch we need an ioctl to flush 
> the
> list of pending interrupts to do so, but reboot is currently broken in that
> regard anyway - patch is currently being looked at)
> 
> We are currently discussion if we should get rid of the APF_STATUS and let 
> the kernel wait for outstanding page faults before returning from KVM_RUN
> or if we go with this patch and let userspace wait for completion. 
> 
> Will discuss this with Dominik, Conny and Alex. So lets defer that till next
> week, ok?

Let us know if we should wait for a v5. :)

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bharat Bhushan
If there is a struct page for the requested mapping then it's
normal RAM and the mapping is set to "M" bit (coherent, cacheable)
otherwise this is treated as I/O and we set  "I + G"  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
v2: some cleanup and added comment
 - 
 arch/powerpc/kvm/e500_mmu_host.c |   23 ++-
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..02eb973 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,26 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
return mas3;
 }
 
-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   /*
+* RAM is always mappable on e500 systems, so this is identical
+* to kvm_is_mmio_pfn(), just without its overhead.
+*/
+   if (!pfn_valid(pfn)) {
+   /* Pages not managed by Kernel are treated as I/O, set I + G */
+   mas2_attr |= MAS2_I | MAS2_G;
 #ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
+   } else {
+   /* Kernel managed pages are actually RAM so set  "M" */
+   mas2_attr |= MAS2_M;
 #endif
+   }
+   return mas2_attr;
 }
 
 /*
@@ -313,7 +326,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
 
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 v2] kvm: powerpc: Do not ignore "E" attribute in mas2

2013-07-18 Thread Bharat Bhushan
Signed-off-by: Bharat Bhushan 
---
v2:
 - No change 
 arch/powerpc/kvm/e500.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index c2e5e98..277cb18 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct 
kvm_vcpu *vcpu)
 #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
 #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
 #define MAS2_ATTRIB_MASK \
- (MAS2_X0 | MAS2_X1)
+ (MAS2_X0 | MAS2_X1 | MAS2_E)
 #define MAS3_ATTRIB_MASK \
  (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
   | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [ [PATCH]] nVMX: Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-07-18 Thread Eduardo Habkost
On Tue, Jul 16, 2013 at 03:01:58PM +0300, Gleb Natapov wrote:
> On Tue, Jul 16, 2013 at 07:56:25PM +0800, Arthur Chunqi Li wrote:
> > On Tue, Jul 16, 2013 at 7:42 PM, Gleb Natapov  wrote:
> > > On Sun, Jul 07, 2013 at 11:13:37PM +0800, Arthur Chunqi Li wrote:
> > >> The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
> > >> to clear this MSR when reset vCPU and keep the value of it when
> > >> migration. This patch add this feature.
> > >>
> > > So what happens if we migrate from qemu that does not have this patch
> > > to qemu that does? Since msr_ia32_feature_control will not be migrated
> > > it will not be set on the destination so destination will not be able to
> > > use nested vmx. Since nested vmx is experimental it may be to early for
> > > us to care about it though, and nested vmx does not work with migration
> > > anyway.
> > In my test, if migration doesn't care about msr_ia32_feature_control,
> > the value will be set to 0 in the destination VM and this may cause
> > some logical confusions, but the VMX running on it may not aware of
> > this (if migration nested vmx is supported in the future) because once
> > VMX initialized, it will not check this msr any more in normal cases.
> > 
> With vmm_exclusive=0 kvm does vmxon/vmxoff while running. But lest not
> worry about nested kvm migration for now. There are much harder problems
> to overcome before it will work.
> 
> > This is also a complex problem since we don't know how many states
> > like this msr need to be migrated related to nested virt. If there're
> > a lot of states need migrating, it is better to reconstruct the
> > relevant codes. But now this patch is enough.
> > 
> > Besides, though migration is not supported in nested vmx, we should
> > keep the machine state consistent during migration. So this patch is
> > also meaningful.

I'm assuming that even "qemu-1.6 -machine pc-1.5" is not expected to
allow migration to a qemu-1.5 binary. Is that OK for everybody, or
should we support backwards migration?

Other than that, the patch looks good to me. If migrating from a version
that doesn't have the patch, we are just going to get the same behavior
we had before.

> > 
> > Arthur
> > >
> > >> Signed-off-by: Arthur Chunqi Li 
> > >> ---
> > >>  target-i386/cpu.h |2 ++
> > >>  target-i386/kvm.c |4 
> > >>  target-i386/machine.c |   22 ++
> > >>  3 files changed, 28 insertions(+)
> > >>
> > >> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> > >> index 62e3547..a418e17 100644
> > >> --- a/target-i386/cpu.h
> > >> +++ b/target-i386/cpu.h
> > >> @@ -301,6 +301,7 @@
> > >>  #define MSR_IA32_APICBASE_BSP   (1<<8)
> > >>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
> > >>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
> > >> +#define MSR_IA32_FEATURE_CONTROL0x003a
> > >>  #define MSR_TSC_ADJUST  0x003b
> > >>  #define MSR_IA32_TSCDEADLINE0x6e0
> > >>
> > >> @@ -813,6 +814,7 @@ typedef struct CPUX86State {
> > >>
> > >>  uint64_t mcg_status;
> > >>  uint64_t msr_ia32_misc_enable;
> > >> +uint64_t msr_ia32_feature_control;
> > >>
> > >>  /* exception/interrupt handling */
> > >>  int error_code;
> > >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > >> index 39f4fbb..3cb2161 100644
> > >> --- a/target-i386/kvm.c
> > >> +++ b/target-i386/kvm.c
> > >> @@ -1122,6 +1122,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> > >>  if (hyperv_vapic_recommended()) {
> > >>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 
> > >> 0);
> > >>  }
> > >> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_FEATURE_CONTROL, 
> > >> env->msr_ia32_feature_control);
> > >>  }
> > >>  if (env->mcg_cap) {
> > >>  int i;
> > >> @@ -1346,6 +1347,7 @@ static int kvm_get_msrs(X86CPU *cpu)
> > >>  if (has_msr_misc_enable) {
> > >>  msrs[n++].index = MSR_IA32_MISC_ENABLE;
> > >>  }
> > >> +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
> > >>
> > >>  if (!env->tsc_valid) {
> > >>  msrs[n++].index = MSR_IA32_TSC;
> > >> @@ -1444,6 +1446,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> > >>  case MSR_IA32_MISC_ENABLE:
> > >>  env->msr_ia32_misc_enable = msrs[i].data;
> > >>  break;
> > >> +case MSR_IA32_FEATURE_CONTROL:
> > >> +env->msr_ia32_feature_control = msrs[i].data;
> > >>  default:
> > >>  if (msrs[i].index >= MSR_MC0_CTL &&
> > >>  msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 
> > >> 4) {
> > >> diff --git a/target-i386/machine.c b/target-i386/machine.c
> > >> index 3659db9..94ca914 100644
> > >> --- a/target-i386/machine.c
> > >> +++ b/target-i386/machine.c
> > >> @@ -399,6 +399,14 @@ static bool misc_enable_needed(void *opaque)
> > >>  return env->msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
> > >>  }
> > >>
> > >> +static bool f

Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 13:06, Gleb Natapov ha scritto:
> On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote:
 and for a testsuite I'd prefer the latter---which means I'd still favor
 setjmp/longjmp.

 Now, here is the long explanation.

 I must admit that the code looks nice.  There are some nits I'd like to
 see done differently (such as putting vmx_return at the beginning of the
 while (1), and the vmresume asm at the end), but it is indeed nice.
>>>
>>> Why do you prefer setjmp/longjmp then?
>>
>> Because it is still deceiving, and I dislike the deceit more than I like
>> the linear code flow.
>>
> What is deceiving about it? Of course for someone who has no idea how
> vmx works the code will not be obvious, but why should we care. For
> someone who knows what is deceiving about returning into the same
> function guest was launched from by using VMX mechanism

The way the code is written is deceiving.  If I see

  asm("vmlaunch; seta %0")
  while (ret)

I expect HOST_RIP to point at the seta or somewhere near, not at a
completely different label somewhere else.

>> instead of longjmp()?

Look again at the setjmp/longjmp version.  longjmp is not used to handle
vmexit.  It is used to jump back from the vmexit handler to main, which
is exactly what setjmp/longjmp is meant for.

>> It is still somewhat magic: the "while (ret)" is only there to please
>> the compiler
>
> No, it it there to catch vmlaunch/vmresume errors.

You could change it to "while (0)" and it would still work.  That's what
I mean by "only there to please the compiler".

>> the compiler, and you rely on the compiler not changing %rsp between the
>> vmlaunch and the vmx_return label.  Minor nit, you cannot anymore print
> HOST_RSP should be loaded on each guest entry.

Right, but my point is: without a big asm blob, you don't know the right
value to load.  It depends on the generated code.  And the big asm blob
limits a lot the "code looks nice" value of this approach.

>> different error messages for vmlaunch vs. vmresume failure.
> Just because the same variable is used to store error values :)
> Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)"

Yeah. :)

>> In the end the choice is between "all in asm" and "small asm trampoline"
>> (which also happens to use setjmp/longjmp, but perhaps Arthur can
>> propagate return codes without using setjmp/longjmp, too).
>>
>>> Rewriting the whole guest entry exit path in asm like you suggest bellow
>>> will eliminate the magic too.
>>
>>> I much prefer one big asm statement than many small asm statement
>>> scattered among two or three C lines.
>>
>> It's not many asm statements, it's just a dozen lines:
>>
> I am not talking about overall file, but the how vmx_run() is written:
> asm()
> C code
> asm()
> C code
> ...
> 
> I much prefer:
> C code
> 
> big asm() blob
> 
> C code.

Me too.  But the trampoline version is neither, it is

static inline void vmlaunch() { asm("vmlaunch") }
static inline void vmresume() { asm("vmresume") }
small asm() blob
C code

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Gleb Natapov
On Thu, Jul 18, 2013 at 12:47:46PM +0200, Paolo Bonzini wrote:
> > > and for a testsuite I'd prefer the latter---which means I'd still favor
> > > setjmp/longjmp.
> > > 
> > > Now, here is the long explanation.
> > > 
> > > I must admit that the code looks nice.  There are some nits I'd like to
> > > see done differently (such as putting vmx_return at the beginning of the
> > > while (1), and the vmresume asm at the end), but it is indeed nice.
> >
> > Why do you prefer setjmp/longjmp then?
> 
> Because it is still deceiving, and I dislike the deceit more than I like
> the linear code flow.
> 
What is deceiving about it? Of course for someone who has no idea how
vmx works the code will not be obvious, but why should we care. For
someone who knows what is deceiving about returning into the same
function guest was launched from by using VMX mechanism instead of
longjmp()?

> > Agree, I dislike this magic too, but this is fixed by you suggestion
> > above about putting vmx_return at the beginning of while(). So the logic
> > will looks like that:
> > 
> > asm volatile("vmlaunch;setbe %0\n\t" : "=m"(ret));
> > while(ret) {
> 
> while(!ret) if you use setbe, of course.
> 
Yeah, this not meant to be compilable code :)

> >vmx_return:
> >SAVE_GPR_C
> >eax = exit_handler();
> >switch(eax) {
> >}
> >LOAD_GPR_C
> >asm volatile("vmresume;seta %0\n\t" : "=m"(ret));
> > }
> 
> It is still somewhat magic: the "while (ret)" is only there to please
No, it it there to catch vmlaunch/vmresume errors.

> the compiler, and you rely on the compiler not changing %rsp between the
> vmlaunch and the vmx_return label.  Minor nit, you cannot anymore print
HOST_RSP should be loaded on each guest entry.

> different error messages for vmlaunch vs. vmresume failure.
Just because the same variable is used to store error values :)
Make vmlaunch use err1 and vmresume err2 and do "while (!err1 & !err2)"

> 
> In the end the choice is between "all in asm" and "small asm trampoline"
> (which also happens to use setjmp/longjmp, but perhaps Arthur can
> propagate return codes without using setjmp/longjmp, too).
> 
> > Rewriting the whole guest entry exit path in asm like you suggest bellow
> > will eliminate the magic too.
> 
> > I much prefer one big asm statement than many small asm statement
> > scattered among two or three C lines.
> 
> It's not many asm statements, it's just a dozen lines:
> 
I am not talking about overall file, but the how vmx_run() is written:
asm()
C code
asm()
C code
...

I much prefer:
C code

big asm() blob

C code.


--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 09:26, Gleb Natapov ha scritto:
> > I had written a long explanation here about why I don't trust the
> > compiler to do the right thing, and ideas about how to fix that.  But in
> > the end the only workable solution is a single assembly blob like vmx.c
> > in KVM to do vmlaunch/vmresume, so I'll get right to the point:
> > 
> >*  the "similarity with KVM code" and "as little assembly as  *
> >*  possible" goals are mutually exclusive *
> 
> I never said that code should be similar to KVM code or have as little
> assembly as possible :) Reread the thread. The only thing I asked for
> is to make code flow linear, if it makes code looks more like KVM or
> reduce amount of assembly this is just a bonus.

Point taken.

> > and for a testsuite I'd prefer the latter---which means I'd still favor
> > setjmp/longjmp.
> > 
> > Now, here is the long explanation.
> > 
> > I must admit that the code looks nice.  There are some nits I'd like to
> > see done differently (such as putting vmx_return at the beginning of the
> > while (1), and the vmresume asm at the end), but it is indeed nice.
>
> Why do you prefer setjmp/longjmp then?

Because it is still deceiving, and I dislike the deceit more than I like
the linear code flow.

> Agree, I dislike this magic too, but this is fixed by you suggestion
> above about putting vmx_return at the beginning of while(). So the logic
> will looks like that:
> 
> asm volatile("vmlaunch;setbe %0\n\t" : "=m"(ret));
> while(ret) {

while(!ret) if you use setbe, of course.

>vmx_return:
>SAVE_GPR_C
>eax = exit_handler();
>switch(eax) {
>}
>LOAD_GPR_C
>asm volatile("vmresume;seta %0\n\t" : "=m"(ret));
> }

It is still somewhat magic: the "while (ret)" is only there to please
the compiler, and you rely on the compiler not changing %rsp between the
vmlaunch and the vmx_return label.  Minor nit, you cannot anymore print
different error messages for vmlaunch vs. vmresume failure.

In the end the choice is between "all in asm" and "small asm trampoline"
(which also happens to use setjmp/longjmp, but perhaps Arthur can
propagate return codes without using setjmp/longjmp, too).

> Rewriting the whole guest entry exit path in asm like you suggest bellow
> will eliminate the magic too.

> I much prefer one big asm statement than many small asm statement
> scattered among two or three C lines.

It's not many asm statements, it's just a dozen lines:


align   4, 0x90
entry_vmx:
SAVE_GPR
call default_exit_handler
/* Should not reach here*/
mov $1, %eax
call exit
.align  4, 0x90
entry_sysenter:
SAVE_GPR
and $0xf, %eax
mov %eax, %edi
calldefault_syscall_handler
/* Arthur, is something missing here? :)) */

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 12:19, “tiejun.chen” wrote:

> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>> 
>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
> 
> 
>> -Original Message-
>> From: Bhushan Bharat-R65777
>> Sent: Thursday, July 18, 2013 1:53 PM
>> To: '"�tiejun.chen�"'
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood 
>> Scott-
>> B07421
>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for 
>> kernel
>> managed pages
>> 
>> 
>> 
>>> -Original Message-
>>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
>>> Sent: Thursday, July 18, 2013 1:52 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>> kernel managed pages
>>> 
>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org
> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
> Sent: Thursday, July 18, 2013 1:01 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> Wood
> Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> kernel managed pages
> 
> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
>>> Sent: Thursday, July 18, 2013 11:56 AM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
>>> Wood
>>> Scott- B07421; Bhushan Bharat-R65777
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>> for kernel managed pages
>>> 
>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
 If there is a struct page for the requested mapping then it's
 normal DDR and the mapping sets "M" bit (coherent, cacheable)
 else this is treated as I/O and we set  "I + G"  (cache
 inhibited,
 guarded)
 
 This helps setting proper TLB mapping for direct assigned device
 
 Signed-off-by: Bharat Bhushan 
 ---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 1c6a9d7..089c227 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -64,13 +64,20 @@ static inline u32
 e500_shadow_mas3_attrib(u32 mas3, int
>>> usermode)
return mas3;
}
 
 -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
 usermode)
 +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
 +  u32 mas2_attr;
 +
 +  mas2_attr = mas2 & MAS2_ATTRIB_MASK;
 +
 +  if (!pfn_valid(pfn)) {
>>> 
>>> Why not directly use kvm_is_mmio_pfn()?
>> 
>> What I understand from this function (someone can correct me) is
>> that it
> returns "false" when the page is managed by kernel and is not
> marked as RESERVED (for some reason). For us it does not matter
> whether the page is reserved or not, if it is kernel visible page 
> then it
>> is DDR.
>> 
> 
> I think you are setting I|G by addressing all mmio pages, right? If
> so,
> 
>  KVM: direct mmio pfn check
> 
>  Userspace may specify memory slots that are backed by mmio
> pages rather than
>  normal RAM.  In some cases it is not enough to identify these
> mmio
>>> pages
>  by pfn_valid().  This patch adds checking the PageReserved as 
> well.
 
 Do you know what are those "some cases" and how checking
 PageReserved helps in
>>> those cases?
>>> 
>>> No, myself didn't see these actual cases in qemu,too. But this should
>>> be chronically persistent as I understand ;-)
>> 
>> Then I will wait till someone educate me :)
> 
> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do 
> not want to call this for all tlbwe operat

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 06:12 PM, Alexander Graf wrote:


On 18.07.2013, at 12:08, “tiejun.chen” wrote:


On 07/18/2013 05:48 PM, Alexander Graf wrote:


On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:





-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '"�tiejun.chen�"'
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set  "I + G"  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
}

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns "false" when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

  KVM: direct mmio pfn check

  Userspace may specify memory slots that are backed by mmio
pages rather than
  normal RAM.  In some cases it is not enough to identify these
mmio

pages

  by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those "some cases" and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


It certainly does more than we need and potentially slows down the fast path (RAM 
mapping). The only thing it does on top of "if (pfn_valid())" is to check for 
pages that are declared reserved on the host. This happens in 2 cases:

   1) Non cache coherent DMA
   2) Memory hot remove

The non coherent DMA case would be interesting, as with the mechanism as it is 
in place in Linux today, we could potentially break normal guest operation if 
we don't take it into account. However, it's Kconfig guarded by:

 depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 default n if PPC_47x
 default y

so we never hit it with any core we care about ;).

Memory hot remove does not exist on e500 FWIW, so we don't have to worry about 
that one either.


Thanks for this good information :)

So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside 
kvm_is_mmio_pfn() to make sure that check is only valid when that is really 
needed? This can decrease those unnecessary performance loss.

If I'm wrong please correct me :)


You're perfectly right, but this is generic KVM code. So it gets run across all 
architectures. What if someone has the great idea to add a new case here for 
x86, but doesn't tell us? In that case

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 06:00 PM, Alexander Graf wrote:


On 18.07.2013, at 11:56, “tiejun.chen” wrote:


On 07/18/2013 05:44 PM, Alexander Graf wrote:


On 18.07.2013, at 10:55, �tiejun.chen� wrote:


On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '"�tiejun.chen�"'
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set  "I + G"  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns "false" when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

   KVM: direct mmio pfn check

   Userspace may specify memory slots that are backed by mmio
pages rather than
   normal RAM.  In some cases it is not enough to identify these
mmio

pages

   by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those "some cases" and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


Furthermore, how to distinguish we're creating TLB entry for the device 
assigned directly to the GS?


Because other devices wouldn't be available to the guest through memory slots.


Yes.




I think its unnecessary to always check if that is mmio's pfn since we have 
more non direct assigned devices.


I'm not sure I understand. The shadow TLB code only knows "here is a host virtual 
address". It needs to figure out whether the host physical address behind that is 
RAM (can access with cache enabled) or not (has to disable cache)



Sorry, looks I'm misleading you :-P


So maybe we can introduce another helper to fixup that TLB entry in instead of 
this path.


This path does fix up the shadow (host) TLB entry :).



I just mean whether we can have a separate path dedicated to those direct 
assigned devices, not go this common path :)


I don't think it's possible to have a separate path without a certain level of 
trust. In the current flow we don't trust anyone. We just check every 
translated page whether we should enable caching or not.

We could take that information from 2 other side though:

   1) Memory Slot
   2) Guest TLB Flags

If we take it from the memory slot we would hav

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 12:08, “tiejun.chen” wrote:

> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>> 
>>> 
>>> 
 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Thursday, July 18, 2013 1:53 PM
 To: '"�tiejun.chen�"'
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood 
 Scott-
 B07421
 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
 managed pages
 
 
 
> -Original Message-
> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
> Sent: Thursday, July 18, 2013 1:52 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
> Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> kernel managed pages
> 
> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: kvm-ppc-ow...@vger.kernel.org
>>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>> Sent: Thursday, July 18, 2013 1:01 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
>>> Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>> kernel managed pages
>>> 
>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
> -Original Message-
> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
> Sent: Thursday, July 18, 2013 11:56 AM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> Wood
> Scott- B07421; Bhushan Bharat-R65777
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> for kernel managed pages
> 
> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>> If there is a struct page for the requested mapping then it's
>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>> else this is treated as I/O and we set  "I + G"  (cache
>> inhibited,
>> guarded)
>> 
>> This helps setting proper TLB mapping for direct assigned device
>> 
>> Signed-off-by: Bharat Bhushan 
>> ---
>>arch/powerpc/kvm/e500_mmu_host.c |   17 -
>>1 files changed, 12 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>> b/arch/powerpc/kvm/e500_mmu_host.c
>> index 1c6a9d7..089c227 100644
>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>> @@ -64,13 +64,20 @@ static inline u32
>> e500_shadow_mas3_attrib(u32 mas3, int
> usermode)
>>  return mas3;
>>}
>> 
>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>> usermode)
>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>{
>> +u32 mas2_attr;
>> +
>> +mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>> +
>> +if (!pfn_valid(pfn)) {
> 
> Why not directly use kvm_is_mmio_pfn()?
 
 What I understand from this function (someone can correct me) is
 that it
>>> returns "false" when the page is managed by kernel and is not
>>> marked as RESERVED (for some reason). For us it does not matter
>>> whether the page is reserved or not, if it is kernel visible page then 
>>> it
 is DDR.
 
>>> 
>>> I think you are setting I|G by addressing all mmio pages, right? If
>>> so,
>>> 
>>>  KVM: direct mmio pfn check
>>> 
>>>  Userspace may specify memory slots that are backed by mmio
>>> pages rather than
>>>  normal RAM.  In some cases it is not enough to identify these
>>> mmio
> pages
>>>  by pfn_valid().  This patch adds checking the PageReserved as well.
>> 
>> Do you know what are those "some cases" and how checking
>> PageReserved helps in
> those cases?
> 
> No, myself didn't see these actual cases in qemu,too. But this should
> be chronically persistent as I understand ;-)
 
 Then I will wait till someone educate me :)
>>> 
>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not 
>>> want to call this for all tlbwe operation unless it is necessary.
>> 
>> It certainly does more than we need and potentially slows down the fast path 
>> (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to 
>> check for pages that are declared reserved on the host. This happens in 2 
>> cases:
>> 
>>   1) Non cache coherent DMA
>>   2) Memory hot remove
>> 
>> The non coherent DMA case wo

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 05:48 PM, Alexander Graf wrote:


On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:





-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '"�tiejun.chen�"'
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set  "I + G"  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
}

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns "false" when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

  KVM: direct mmio pfn check

  Userspace may specify memory slots that are backed by mmio
pages rather than
  normal RAM.  In some cases it is not enough to identify these
mmio

pages

  by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those "some cases" and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


It certainly does more than we need and potentially slows down the fast path (RAM 
mapping). The only thing it does on top of "if (pfn_valid())" is to check for 
pages that are declared reserved on the host. This happens in 2 cases:

   1) Non cache coherent DMA
   2) Memory hot remove

The non coherent DMA case would be interesting, as with the mechanism as it is 
in place in Linux today, we could potentially break normal guest operation if 
we don't take it into account. However, it's Kconfig guarded by:

 depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 default n if PPC_47x
 default y

so we never hit it with any core we care about ;).

Memory hot remove does not exist on e500 FWIW, so we don't have to worry about 
that one either.


Thanks for this good information :)

So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() 
to make sure that check is only valid when that is really needed? This can 
decrease those unnecessary performance loss.


If I'm wrong please correct me :)

Tiejun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 11:56, “tiejun.chen” wrote:

> On 07/18/2013 05:44 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 10:55, �tiejun.chen� wrote:
>> 
>>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
 
 
> -Original Message-
> From: Bhushan Bharat-R65777
> Sent: Thursday, July 18, 2013 1:53 PM
> To: '"�tiejun.chen�"'
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood 
> Scott-
> B07421
> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> 
> 
>> -Original Message-
>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
>> Sent: Thursday, July 18, 2013 1:52 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
>> Scott-
>> B07421
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>> kernel managed pages
>> 
>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>> 
>>> 
 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
 Sent: Thursday, July 18, 2013 1:01 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
 Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
 kernel managed pages
 
 On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> 
> 
>> -Original Message-
>> From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
>> Sent: Thursday, July 18, 2013 11:56 AM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
>> Wood
>> Scott- B07421; Bhushan Bharat-R65777
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>> for kernel managed pages
>> 
>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>> If there is a struct page for the requested mapping then it's
>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>> else this is treated as I/O and we set  "I + G"  (cache
>>> inhibited,
>>> guarded)
>>> 
>>> This helps setting proper TLB mapping for direct assigned device
>>> 
>>> Signed-off-by: Bharat Bhushan 
>>> ---
>>> arch/powerpc/kvm/e500_mmu_host.c |   17 -
>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>> index 1c6a9d7..089c227 100644
>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>> @@ -64,13 +64,20 @@ static inline u32
>>> e500_shadow_mas3_attrib(u32 mas3, int
>> usermode)
>>> return mas3;
>>> }
>>> 
>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>> usermode)
>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>> {
>>> +   u32 mas2_attr;
>>> +
>>> +   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>> +
>>> +   if (!pfn_valid(pfn)) {
>> 
>> Why not directly use kvm_is_mmio_pfn()?
> 
> What I understand from this function (someone can correct me) is
> that it
 returns "false" when the page is managed by kernel and is not
 marked as RESERVED (for some reason). For us it does not matter
 whether the page is reserved or not, if it is kernel visible page then 
 it
> is DDR.
> 
 
 I think you are setting I|G by addressing all mmio pages, right? If
 so,
 
   KVM: direct mmio pfn check
 
   Userspace may specify memory slots that are backed by mmio
 pages rather than
   normal RAM.  In some cases it is not enough to identify these
 mmio
>> pages
   by pfn_valid().  This patch adds checking the PageReserved as 
 well.
>>> 
>>> Do you know what are those "some cases" and how checking
>>> PageReserved helps in
>> those cases?
>> 
>> No, myself didn't see these actual cases in qemu,too. But this should
>> be chronically persistent as I understand ;-)
> 
> Then I will wait till someone educate me :)
 
 The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not 
 want to call this for all tlbwe operation unless it is necessary.
>>> 
>>> Furthermore, how to distinguish we're creating TLB entry for the device 
>>> assigned directly to the GS?
>> 
>> Because other devices wouldn't b

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 05:44 PM, Alexander Graf wrote:


On 18.07.2013, at 10:55, �tiejun.chen� wrote:


On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '"�tiejun.chen�"'
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "�tiejun.chen�"
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "�tiejun.chen�" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set  "I + G"  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns "false" when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

   KVM: direct mmio pfn check

   Userspace may specify memory slots that are backed by mmio
pages rather than
   normal RAM.  In some cases it is not enough to identify these
mmio

pages

   by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those "some cases" and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


Furthermore, how to distinguish we're creating TLB entry for the device 
assigned directly to the GS?


Because other devices wouldn't be available to the guest through memory slots.


Yes.




I think its unnecessary to always check if that is mmio's pfn since we have 
more non direct assigned devices.


I'm not sure I understand. The shadow TLB code only knows "here is a host virtual 
address". It needs to figure out whether the host physical address behind that is 
RAM (can access with cache enabled) or not (has to disable cache)



Sorry, looks I'm misleading you :-P


So maybe we can introduce another helper to fixup that TLB entry in instead of 
this path.


This path does fix up the shadow (host) TLB entry :).



I just mean whether we can have a separate path dedicated to those direct 
assigned devices, not go this common path :)


Tiejun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
> Behalf Of Alexander Graf
> Sent: Thursday, July 18, 2013 3:19 PM
> To: Bhushan Bharat-R65777
> Cc: "“tiejun.chen”"; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> 
> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Bhushan Bharat-R65777
> >> Sent: Thursday, July 18, 2013 1:53 PM
> >> To: '"“tiejun.chen”"'
> >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
> >> Scott-
> >> B07421
> >> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >> kernel managed pages
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> >>> Sent: Thursday, July 18, 2013 1:52 PM
> >>> To: Bhushan Bharat-R65777
> >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> >>> Wood
> >>> Scott-
> >>> B07421
> >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >>> kernel managed pages
> >>>
> >>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> 
> 
> > -Original Message-
> > From: kvm-ppc-ow...@vger.kernel.org
> > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> > Sent: Thursday, July 18, 2013 1:01 PM
> > To: Bhushan Bharat-R65777
> > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> > Wood
> > Scott-
> > B07421
> > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> > for kernel managed pages
> >
> > On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> >>> Sent: Thursday, July 18, 2013 11:56 AM
> >>> To: Bhushan Bharat-R65777
> >>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> >>> Wood
> >>> Scott- B07421; Bhushan Bharat-R65777
> >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> >>> for kernel managed pages
> >>>
> >>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>  If there is a struct page for the requested mapping then it's
>  normal DDR and the mapping sets "M" bit (coherent, cacheable)
>  else this is treated as I/O and we set  "I + G"  (cache
>  inhibited,
>  guarded)
> 
>  This helps setting proper TLB mapping for direct assigned
>  device
> 
>  Signed-off-by: Bharat Bhushan 
>  ---
> arch/powerpc/kvm/e500_mmu_host.c |   17 -
> 1 files changed, 12 insertions(+), 5 deletions(-)
> 
>  diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>  b/arch/powerpc/kvm/e500_mmu_host.c
>  index 1c6a9d7..089c227 100644
>  --- a/arch/powerpc/kvm/e500_mmu_host.c
>  +++ b/arch/powerpc/kvm/e500_mmu_host.c
>  @@ -64,13 +64,20 @@ static inline u32
>  e500_shadow_mas3_attrib(u32 mas3, int
> >>> usermode)
>   return mas3;
> }
> 
>  -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>  usermode)
>  +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> {
>  +u32 mas2_attr;
>  +
>  +mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>  +
>  +if (!pfn_valid(pfn)) {
> >>>
> >>> Why not directly use kvm_is_mmio_pfn()?
> >>
> >> What I understand from this function (someone can correct me) is
> >> that it
> > returns "false" when the page is managed by kernel and is not
> > marked as RESERVED (for some reason). For us it does not matter
> > whether the page is reserved or not, if it is kernel visible page
> > then it
> >> is DDR.
> >>
> >
> > I think you are setting I|G by addressing all mmio pages, right?
> > If so,
> >
> >  KVM: direct mmio pfn check
> >
> >  Userspace may specify memory slots that are backed by mmio
> > pages rather than
> >  normal RAM.  In some cases it is not enough to identify these
> > mmio
> >>> pages
> >  by pfn_valid().  This patch adds checking the PageReserved as well.
> 
>  Do you know what are those "some cases" and how checking
>  PageReserved helps in
> >>> those cases?
> >>>
> >>> No, myself didn't see these actual cases in qemu,too. But this
> >>> should be chronically persistent as I understand ;-)
> >>
> >> Then I will wait till someone educate me :)
> >
> > The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not
> want to call this for all tlbwe operation unless it is necessary.
> 
> It certainly does more than we need and potentially slows down

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: Bhushan Bharat-R65777
>> Sent: Thursday, July 18, 2013 1:53 PM
>> To: '"“tiejun.chen”"'
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
>> B07421
>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>> 
>> 
>> 
>>> -Original Message-
>>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
>>> Sent: Thursday, July 18, 2013 1:52 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>> kernel managed pages
>>> 
>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org
> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> Sent: Thursday, July 18, 2013 1:01 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> Wood
> Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> kernel managed pages
> 
> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
>>> Sent: Thursday, July 18, 2013 11:56 AM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
>>> Wood
>>> Scott- B07421; Bhushan Bharat-R65777
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>> for kernel managed pages
>>> 
>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
 If there is a struct page for the requested mapping then it's
 normal DDR and the mapping sets "M" bit (coherent, cacheable)
 else this is treated as I/O and we set  "I + G"  (cache
 inhibited,
 guarded)
 
 This helps setting proper TLB mapping for direct assigned device
 
 Signed-off-by: Bharat Bhushan 
 ---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 1c6a9d7..089c227 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -64,13 +64,20 @@ static inline u32
 e500_shadow_mas3_attrib(u32 mas3, int
>>> usermode)
return mas3;
}
 
 -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
 usermode)
 +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
 +  u32 mas2_attr;
 +
 +  mas2_attr = mas2 & MAS2_ATTRIB_MASK;
 +
 +  if (!pfn_valid(pfn)) {
>>> 
>>> Why not directly use kvm_is_mmio_pfn()?
>> 
>> What I understand from this function (someone can correct me) is
>> that it
> returns "false" when the page is managed by kernel and is not
> marked as RESERVED (for some reason). For us it does not matter
> whether the page is reserved or not, if it is kernel visible page then it
>> is DDR.
>> 
> 
> I think you are setting I|G by addressing all mmio pages, right? If
> so,
> 
>  KVM: direct mmio pfn check
> 
>  Userspace may specify memory slots that are backed by mmio
> pages rather than
>  normal RAM.  In some cases it is not enough to identify these
> mmio
>>> pages
>  by pfn_valid().  This patch adds checking the PageReserved as well.
 
 Do you know what are those "some cases" and how checking
 PageReserved helps in
>>> those cases?
>>> 
>>> No, myself didn't see these actual cases in qemu,too. But this should
>>> be chronically persistent as I understand ;-)
>> 
>> Then I will wait till someone educate me :)
> 
> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not 
> want to call this for all tlbwe operation unless it is necessary.

It certainly does more than we need and potentially slows down the fast path 
(RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check 
for pages that are declared reserved on the host. This happens in 2 cases:

  1) Non cache coherent DMA
  2) Memory hot remove

The non coherent DMA case would be interesting, as with the mechanism as it is 
in place in Linux today, we could potentially break normal guest operation if 
we don't take it into account. However, it's Kconfig guarded by:

depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
default n if PPC_47x
default y

so we never hit it with any core we care about ;).

Memory hot r

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 10:55, “tiejun.chen” wrote:

> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: Bhushan Bharat-R65777
>>> Sent: Thursday, July 18, 2013 1:53 PM
>>> To: '"“tiejun.chen”"'
>>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
>>> B07421
>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>> managed pages
>>> 
>>> 
>>> 
 -Original Message-
 From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 1:52 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
 kernel managed pages
 
 On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> 
> 
>> -Original Message-
>> From: kvm-ppc-ow...@vger.kernel.org
>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”"
>> Sent: Thursday, July 18, 2013 1:01 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
>> Wood
>> Scott-
>> B07421
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>> kernel managed pages
>> 
>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>> 
>>> 
 -Original Message-
 From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 11:56 AM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
 Wood
 Scott- B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
 for kernel managed pages
 
 On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> If there is a struct page for the requested mapping then it's
> normal DDR and the mapping sets "M" bit (coherent, cacheable)
> else this is treated as I/O and we set  "I + G"  (cache
> inhibited,
> guarded)
> 
> This helps setting proper TLB mapping for direct assigned device
> 
> Signed-off-by: Bharat Bhushan 
> ---
> arch/powerpc/kvm/e500_mmu_host.c |   17 -
> 1 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> b/arch/powerpc/kvm/e500_mmu_host.c
> index 1c6a9d7..089c227 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -64,13 +64,20 @@ static inline u32
> e500_shadow_mas3_attrib(u32 mas3, int
 usermode)
>   return mas3;
> }
> 
> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> usermode)
> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> {
> + u32 mas2_attr;
> +
> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> +
> + if (!pfn_valid(pfn)) {
 
 Why not directly use kvm_is_mmio_pfn()?
>>> 
>>> What I understand from this function (someone can correct me) is
>>> that it
>> returns "false" when the page is managed by kernel and is not
>> marked as RESERVED (for some reason). For us it does not matter
>> whether the page is reserved or not, if it is kernel visible page then it
>>> is DDR.
>>> 
>> 
>> I think you are setting I|G by addressing all mmio pages, right? If
>> so,
>> 
>>   KVM: direct mmio pfn check
>> 
>>   Userspace may specify memory slots that are backed by mmio
>> pages rather than
>>   normal RAM.  In some cases it is not enough to identify these
>> mmio
 pages
>>   by pfn_valid().  This patch adds checking the PageReserved as well.
> 
> Do you know what are those "some cases" and how checking
> PageReserved helps in
 those cases?
 
 No, myself didn't see these actual cases in qemu,too. But this should
 be chronically persistent as I understand ;-)
>>> 
>>> Then I will wait till someone educate me :)
>> 
>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not 
>> want to call this for all tlbwe operation unless it is necessary.
> 
> Furthermore, how to distinguish we're creating TLB entry for the device 
> assigned directly to the GS?

Because other devices wouldn't be available to the guest through memory slots.

> I think its unnecessary to always check if that is mmio's pfn since we have 
> more non direct assigned devices.

I'm not sure I understand. The shadow TLB code only knows "here is a host 
virtual address". It needs to figure out whether the host physical address 
behind that is RAM (can access with cache enabled) or not (has to dis

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, July 18, 2013 1:53 PM
To: '"“tiejun.chen”"'
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages




-Original Message-
From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 1:52 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”"
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
for kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set  "I + G"  (cache
inhibited,
guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32
e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
 }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
 {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is
that it

returns "false" when the page is managed by kernel and is not
marked as RESERVED (for some reason). For us it does not matter
whether the page is reserved or not, if it is kernel visible page then it

is DDR.




I think you are setting I|G by addressing all mmio pages, right? If
so,

   KVM: direct mmio pfn check

   Userspace may specify memory slots that are backed by mmio
pages rather than
   normal RAM.  In some cases it is not enough to identify these
mmio

pages

   by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those "some cases" and how checking
PageReserved helps in

those cases?

No, myself didn't see these actual cases in qemu,too. But this should
be chronically persistent as I understand ;-)


Then I will wait till someone educate me :)


The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.


Furthermore, how to distinguish we're creating TLB entry for the device assigned 
directly to the GS?


I think its unnecessary to always check if that is mmio's pfn since we have more 
non direct assigned devices.


So maybe we can introduce another helper to fixup that TLB entry in instead of 
this path.


Tiejun



-Bharat


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
 #ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
 #endif
+   }
+   return mas2_attr;
 }

 /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);








--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc"
in the body of a message to majord...@vger.kernel.org More
majordomo info at http://vger.kernel.org/majordomo-info.html








--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord..

RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Bhushan Bharat-R65777
> Sent: Thursday, July 18, 2013 1:53 PM
> To: '"“tiejun.chen”"'
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
> B07421
> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> 
> 
> > -Original Message-
> > From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> > Sent: Thursday, July 18, 2013 1:52 PM
> > To: Bhushan Bharat-R65777
> > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
> > Scott-
> > B07421
> > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> > kernel managed pages
> >
> > On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: kvm-ppc-ow...@vger.kernel.org
> > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> > >> Sent: Thursday, July 18, 2013 1:01 PM
> > >> To: Bhushan Bharat-R65777
> > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> > >> Wood
> > >> Scott-
> > >> B07421
> > >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> > >> kernel managed pages
> > >>
> > >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> > >>>
> > >>>
> >  -Original Message-
> >  From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> >  Sent: Thursday, July 18, 2013 11:56 AM
> >  To: Bhushan Bharat-R65777
> >  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
> >  Wood
> >  Scott- B07421; Bhushan Bharat-R65777
> >  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> >  for kernel managed pages
> > 
> >  On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> > > If there is a struct page for the requested mapping then it's
> > > normal DDR and the mapping sets "M" bit (coherent, cacheable)
> > > else this is treated as I/O and we set  "I + G"  (cache
> > > inhibited,
> > > guarded)
> > >
> > > This helps setting proper TLB mapping for direct assigned device
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > > arch/powerpc/kvm/e500_mmu_host.c |   17 -
> > > 1 files changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> > > b/arch/powerpc/kvm/e500_mmu_host.c
> > > index 1c6a9d7..089c227 100644
> > > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > > @@ -64,13 +64,20 @@ static inline u32
> > > e500_shadow_mas3_attrib(u32 mas3, int
> >  usermode)
> > >   return mas3;
> > > }
> > >
> > > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> > > usermode)
> > > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> > > {
> > > + u32 mas2_attr;
> > > +
> > > + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> > > +
> > > + if (!pfn_valid(pfn)) {
> > 
> >  Why not directly use kvm_is_mmio_pfn()?
> > >>>
> > >>> What I understand from this function (someone can correct me) is
> > >>> that it
> > >> returns "false" when the page is managed by kernel and is not
> > >> marked as RESERVED (for some reason). For us it does not matter
> > >> whether the page is reserved or not, if it is kernel visible page then it
> is DDR.
> > >>>
> > >>
> > >> I think you are setting I|G by addressing all mmio pages, right? If
> > >> so,
> > >>
> > >>   KVM: direct mmio pfn check
> > >>
> > >>   Userspace may specify memory slots that are backed by mmio
> > >> pages rather than
> > >>   normal RAM.  In some cases it is not enough to identify these
> > >> mmio
> > pages
> > >>   by pfn_valid().  This patch adds checking the PageReserved as well.
> > >
> > > Do you know what are those "some cases" and how checking
> > > PageReserved helps in
> > those cases?
> >
> > No, myself didn't see these actual cases in qemu,too. But this should
> > be chronically persistent as I understand ;-)
> 
> Then I will wait till someone educate me :)

The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want 
to call this for all tlbwe operation unless it is necessary.

-Bharat

> > > + mas2_attr |= MAS2_I | MAS2_G;
> > > + } else {
> > > #ifdef CONFIG_SMP
> > > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> > > -#else
> > > - return mas2 & MAS2_ATTRIB_MASK;
> > > + mas2_attr |= MAS2_M;
> > > #endif
> > > + }
> > > + return mas2_attr;
> > > }
> > >
> > > /*
> > > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> > >   /* Force IPROT=0 for all guest mappings. */
> > >   stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | 
> > > MAS1_VALID;
> > >   stlbe->mas2 = (gvaddr & MAS2_EPN) |
> > > - 

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set  "I + G"  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
  arch/powerpc/kvm/e500_mmu_host.c |   17 -
  1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
return mas3;
  }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
  {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {
+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
  #ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
  #endif
+   }


Additionally, in UP case this little chunk of code is equivalent to

if (1) {
mas2_attr |= MAS2_I | MAS2_G;
} else {
}

So you'd better wrapper MAS2_m in advance like,

#ifdef CONFIG_SMP
#define M_IF_SMPMAS2_M
#else
#define M_IF_SMP0
#endif

Then
if (1)
mas2_attr |= MAS2_I | MAS2_G;
else
mas2_attr |= M_IF_SMP;

Tiejun


+   return mas2_attr;
  }

  /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> Sent: Thursday, July 18, 2013 1:52 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -Original Message-
> >> From: kvm-ppc-ow...@vger.kernel.org
> >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> >> Sent: Thursday, July 18, 2013 1:01 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
> >> Scott-
> >> B07421
> >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >> kernel managed pages
> >>
> >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >>>
> >>>
>  -Original Message-
>  From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
>  Sent: Thursday, July 18, 2013 11:56 AM
>  To: Bhushan Bharat-R65777
>  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
>  Wood
>  Scott- B07421; Bhushan Bharat-R65777
>  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>  kernel managed pages
> 
>  On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> > If there is a struct page for the requested mapping then it's
> > normal DDR and the mapping sets "M" bit (coherent, cacheable) else
> > this is treated as I/O and we set  "I + G"  (cache inhibited,
> > guarded)
> >
> > This helps setting proper TLB mapping for direct assigned device
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > arch/powerpc/kvm/e500_mmu_host.c |   17 -
> > 1 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> > b/arch/powerpc/kvm/e500_mmu_host.c
> > index 1c6a9d7..089c227 100644
> > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
> > mas3, int
>  usermode)
> > return mas3;
> > }
> >
> > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> > {
> > +   u32 mas2_attr;
> > +
> > +   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> > +
> > +   if (!pfn_valid(pfn)) {
> 
>  Why not directly use kvm_is_mmio_pfn()?
> >>>
> >>> What I understand from this function (someone can correct me) is
> >>> that it
> >> returns "false" when the page is managed by kernel and is not marked
> >> as RESERVED (for some reason). For us it does not matter whether the
> >> page is reserved or not, if it is kernel visible page then it is DDR.
> >>>
> >>
> >> I think you are setting I|G by addressing all mmio pages, right? If
> >> so,
> >>
> >>   KVM: direct mmio pfn check
> >>
> >>   Userspace may specify memory slots that are backed by mmio
> >> pages rather than
> >>   normal RAM.  In some cases it is not enough to identify these mmio
> pages
> >>   by pfn_valid().  This patch adds checking the PageReserved as well.
> >
> > Do you know what are those "some cases" and how checking PageReserved helps 
> > in
> those cases?
> 
> No, myself didn't see these actual cases in qemu,too. But this should be
> chronically persistent as I understand ;-)

Then I will wait till someone educate me :)

-Bharat

> 
> Tiejun
> 
> >
> > -Bharat
> >
> >>
> >> Tiejun
> >>
> >>> -Bharat
> >>>
> 
>  Tiejun
> 
> > +   mas2_attr |= MAS2_I | MAS2_G;
> > +   } else {
> > #ifdef CONFIG_SMP
> > -   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> > -#else
> > -   return mas2 & MAS2_ATTRIB_MASK;
> > +   mas2_attr |= MAS2_M;
> > #endif
> > +   }
> > +   return mas2_attr;
> > }
> >
> > /*
> > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> > /* Force IPROT=0 for all guest mappings. */
> > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | 
> > MAS1_VALID;
> > stlbe->mas2 = (gvaddr & MAS2_EPN) |
> > - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >
> >
> 
> >>>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> >> the body of a message to majord...@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> >
> 



Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
Behalf Of "“tiejun.chen”"
Sent: Thursday, July 18, 2013 1:01 PM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
Scott- B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
kernel managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's normal
DDR and the mapping sets "M" bit (coherent, cacheable) else this is
treated as I/O and we set  "I + G"  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
mas3, int

usermode)

return mas3;
}

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is that it

returns "false" when the page is managed by kernel and is not marked as RESERVED
(for some reason). For us it does not matter whether the page is reserved or
not, if it is kernel visible page then it is DDR.




I think you are setting I|G by addressing all mmio pages, right? If so,

  KVM: direct mmio pfn check

  Userspace may specify memory slots that are backed by mmio pages rather
than
  normal RAM.  In some cases it is not enough to identify these mmio pages
  by pfn_valid().  This patch adds checking the PageReserved as well.


Do you know what are those "some cases" and how checking PageReserved helps in 
those cases?


No, myself didn't see these actual cases in qemu,too. But this should be 
chronically persistent as I understand ;-)


Tiejun



-Bharat



Tiejun


-Bharat



Tiejun


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
#ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
#endif
+   }
+   return mas2_attr;
}

/*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);








--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
of a message to majord...@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
> Behalf Of "“tiejun.chen”"
> Sent: Thursday, July 18, 2013 1:01 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -Original Message-
> >> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> >> Sent: Thursday, July 18, 2013 11:56 AM
> >> To: Bhushan Bharat-R65777
> >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
> >> Scott- B07421; Bhushan Bharat-R65777
> >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >> kernel managed pages
> >>
> >> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >>> If there is a struct page for the requested mapping then it's normal
> >>> DDR and the mapping sets "M" bit (coherent, cacheable) else this is
> >>> treated as I/O and we set  "I + G"  (cache inhibited, guarded)
> >>>
> >>> This helps setting proper TLB mapping for direct assigned device
> >>>
> >>> Signed-off-by: Bharat Bhushan 
> >>> ---
> >>>arch/powerpc/kvm/e500_mmu_host.c |   17 -
> >>>1 files changed, 12 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >>> b/arch/powerpc/kvm/e500_mmu_host.c
> >>> index 1c6a9d7..089c227 100644
> >>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >>> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
> >>> mas3, int
> >> usermode)
> >>>   return mas3;
> >>>}
> >>>
> >>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> >>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> >>>{
> >>> + u32 mas2_attr;
> >>> +
> >>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >>> +
> >>> + if (!pfn_valid(pfn)) {
> >>
> >> Why not directly use kvm_is_mmio_pfn()?
> >
> > What I understand from this function (someone can correct me) is that it
> returns "false" when the page is managed by kernel and is not marked as 
> RESERVED
> (for some reason). For us it does not matter whether the page is reserved or
> not, if it is kernel visible page then it is DDR.
> >
> 
> I think you are setting I|G by addressing all mmio pages, right? If so,
> 
>  KVM: direct mmio pfn check
> 
>  Userspace may specify memory slots that are backed by mmio pages rather
> than
>  normal RAM.  In some cases it is not enough to identify these mmio pages
>  by pfn_valid().  This patch adds checking the PageReserved as well.

Do you know what are those "some cases" and how checking PageReserved helps in 
those cases?

-Bharat

> 
> Tiejun
> 
> > -Bharat
> >
> >>
> >> Tiejun
> >>
> >>> + mas2_attr |= MAS2_I | MAS2_G;
> >>> + } else {
> >>>#ifdef CONFIG_SMP
> >>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> >>> -#else
> >>> - return mas2 & MAS2_ATTRIB_MASK;
> >>> + mas2_attr |= MAS2_M;
> >>>#endif
> >>> + }
> >>> + return mas2_attr;
> >>>}
> >>>
> >>>/*
> >>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> >>>   /* Force IPROT=0 for all guest mappings. */
> >>>   stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | 
> >>> MAS1_VALID;
> >>>   stlbe->mas2 = (gvaddr & MAS2_EPN) |
> >>> -   e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> >>> +   e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> >>>   stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> >>>   e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >>>
> >>>
> >>
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread “tiejun.chen”

On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
Sent: Thursday, July 18, 2013 11:56 AM
To: Bhushan Bharat-R65777
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
B07421; Bhushan Bharat-R65777
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
managed pages

On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

If there is a struct page for the requested mapping then it's normal
DDR and the mapping sets "M" bit (coherent, cacheable) else this is
treated as I/O and we set  "I + G"  (cache inhibited, guarded)

This helps setting proper TLB mapping for direct assigned device

Signed-off-by: Bharat Bhushan 
---
   arch/powerpc/kvm/e500_mmu_host.c |   17 -
   1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c
b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int

usermode)

return mas3;
   }

-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
   {
+   u32 mas2_attr;
+
+   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+   if (!pfn_valid(pfn)) {


Why not directly use kvm_is_mmio_pfn()?


What I understand from this function (someone can correct me) is that it returns 
"false" when the page is managed by kernel and is not marked as RESERVED (for 
some reason). For us it does not matter whether the page is reserved or not, if it is 
kernel visible page then it is DDR.



I think you are setting I|G by addressing all mmio pages, right? If so,

KVM: direct mmio pfn check

Userspace may specify memory slots that are backed by mmio pages rather than
normal RAM.  In some cases it is not enough to identify these mmio pages
by pfn_valid().  This patch adds checking the PageReserved as well.

Tiejun


-Bharat



Tiejun


+   mas2_attr |= MAS2_I | MAS2_G;
+   } else {
   #ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
+   mas2_attr |= MAS2_M;
   #endif
+   }
+   return mas2_attr;
   }

   /*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);








--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-18 Thread Gleb Natapov
On Thu, Jul 18, 2013 at 07:52:21AM +0200, Paolo Bonzini wrote:
> Il 17/07/2013 20:54, Arthur Chunqi Li ha scritto:
> > +   ".globl entry_sysenter\n\t"
> > +   "entry_sysenter:\n\t"
> > +   SAVE_GPR
> > +   "   and $0xf, %rax\n\t"
> > +   "   push%rax\n\t"
> 
> push should be wrong here, the first argument is in %rdi.
> 
> > +   "   callsyscall_handler\n\t"
> > +   LOAD_GPR
> > +   "   vmresume\n\t"
> > +);
> > +
> > +int exit_handler()
> 
> This (and syscall_handler too) needs __attribute__((__used__)) because
> it's only used from assembly.
> 
That was my question actually, why is it used from assembly? Calling it
from see should not be a problem.

> Please add "static" to all functions except main, it triggers better
> compiler optimization and warnings and it will avoid nasty surprises in
> the future if the compiler becomes smarter.
> 
> > +{
> > +   int ret;
> > +
> > +   current->exits++;
> > +   current->guest_regs = regs;
> > +   if (is_hypercall())
> > +   ret = handle_hypercall();
> > +   else
> > +   ret = current->exit_handler();
> > +   regs = current->guest_regs;
> > +   switch (ret) {
> > +   case VMX_TEST_VMEXIT:
> > +   case VMX_TEST_RESUME:
> > +   return ret;
> > +   case VMX_TEST_EXIT:
> > +   break;
> > +   default:
> > +   printf("ERROR : Invalid exit_handler return val %d.\n"
> > +   , ret);
> > +   }
> 
> Here have to propagate the exit codes through multiple levels, because
> we're not using setjmp/longjmp.  Not a big deal.  My worries with this
> version are below.
> 
> > +   print_vmexit_info();
> > +   exit(-1);
> > +   return 0;
> > +}
> > +
> > +int vmx_run()
> > +{
> > +   bool ret;
> > +   u32 eax;
> > +   u64 rsp;
> > +
> > +   asm volatile("mov %%rsp, %0\n\t" : "=r"(rsp));
> > +   vmcs_write(HOST_RSP, rsp);
> > +   asm volatile("vmlaunch;seta %0\n\t" : "=m"(ret));
> 
> setbe (this path is probably untested because it never happens in practice).
> 
At least one of the tests should set up wrong vmcs state to verify that
nested vmx handles it correctly. In fact one of the patches that Arthur
have sent to nested vmx fixes exactly that code path.

> Also, missing memory clobber.
> 
> > +   if (ret) {
> > +   printf("%s : vmlaunch failed.\n", __func__);
> > +   return 1;
> > +   }
> > +   while (1) {
> > +   asm volatile(
> > +   LOAD_GPR_C
> > +   "vmresume;seta %0\n\t"
> > +   : "=m"(ret));
> 
> setbe and missing memory clobber.
> 
> > +   if (ret) {
> > +   printf("%s : vmresume failed.\n", __func__);
> > +   return 1;
> > +   }
> > +   asm volatile(
> > +   ".global vmx_return\n\t"
> 
> .global should not be needed.
> 
> > +   "vmx_return:\n\t"
> > +   SAVE_GPR_C
> > +   "call exit_handler\n\t"
> > +   "mov %%eax, %0\n\t"
> > +   : "=r"(eax)
> > +   );
> 
> I had written a long explanation here about why I don't trust the
> compiler to do the right thing, and ideas about how to fix that.  But in
> the end the only workable solution is a single assembly blob like vmx.c
> in KVM to do vmlaunch/vmresume, so I'll get right to the point:
> 
>*  the "similarity with KVM code" and "as little assembly as  *
>*  possible" goals are mutually exclusive *
> 
I never said that code should be similar to KVM code or have as little
assembly as possible :) Reread the thread. The only thing I asked for
is to make code flow linear, if it makes code looks more like KVM or
reduce amount of assembly this is just a bonus.

> and for a testsuite I'd prefer the latter---which means I'd still favor
> setjmp/longjmp.
> 
> Now, here is the long explanation.
> 
> I must admit that the code looks nice.  There are some nits I'd like to
> see done differently (such as putting vmx_return at the beginning of the
> while (1), and the vmresume asm at the end), but it is indeed nice.
> 
Why do you prefer setjmp/longjmp then?

> However, it is also very deceiving, because the processor is not doing
> what the code says.  If I see:
> 
>vmlaunch();
>exit(-1);
> 
> it is clear that magic happens in vmlaunch().  If I see
> 
>asm volatile("vmlaunch;setbe %0\n\t" : "=m"(ret));
>if (ret) {
>...
>}
>asm("vmx_return:")
> 
> it is absolutely not clear that the setbe and "if" are skipped on a
> successful vmlaunch.  So, at the very least you need a comment before
> those "if (ret)" to document the control flow, or you can use a jmp like
> this:
> 
>asm volatile goto ("vmlaunch;jmp %0\n\t"
>   : : : "memory" : vmlaunch_error);
>if (0) {
>vmlaunch_error: ...
>}
> 
> The unconditional jump and "asm goto" make it clear that magic happens.
Agree, I dislike this magic too, 

RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: "“tiejun.chen”" [mailto:tiejun.c...@windriver.com]
> Sent: Thursday, July 18, 2013 11:56 AM
> To: Bhushan Bharat-R65777
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood Scott-
> B07421; Bhushan Bharat-R65777
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> > If there is a struct page for the requested mapping then it's normal
> > DDR and the mapping sets "M" bit (coherent, cacheable) else this is
> > treated as I/O and we set  "I + G"  (cache inhibited, guarded)
> >
> > This helps setting proper TLB mapping for direct assigned device
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >   arch/powerpc/kvm/e500_mmu_host.c |   17 -
> >   1 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> > b/arch/powerpc/kvm/e500_mmu_host.c
> > index 1c6a9d7..089c227 100644
> > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
> usermode)
> > return mas3;
> >   }
> >
> > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> >   {
> > +   u32 mas2_attr;
> > +
> > +   mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> > +
> > +   if (!pfn_valid(pfn)) {
> 
> Why not directly use kvm_is_mmio_pfn()?

What I understand from this function (someone can correct me) is that it 
returns "false" when the page is managed by kernel and is not marked as 
RESERVED (for some reason). For us it does not matter whether the page is 
reserved or not, if it is kernel visible page then it is DDR.

-Bharat

> 
> Tiejun
> 
> > +   mas2_attr |= MAS2_I | MAS2_G;
> > +   } else {
> >   #ifdef CONFIG_SMP
> > -   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> > -#else
> > -   return mas2 & MAS2_ATTRIB_MASK;
> > +   mas2_attr |= MAS2_M;
> >   #endif
> > +   }
> > +   return mas2_attr;
> >   }
> >
> >   /*
> > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> > /* Force IPROT=0 for all guest mappings. */
> > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> > stlbe->mas2 = (gvaddr & MAS2_EPN) |
> > - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >
> >
> 

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf