date:20151120

Fwd: 100% Frequency CPU - Windows2k8R2 Guest

2015-11-20 Thread Thiago Oliveira

Hi There!

After installation of the Windows2k8R2 like Guest and all drivers, the
cpu frequency stay in 100%.
I'm using QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.19)
on Ubuntu 14.04

See the attached about that.

Anybody can help me ?

Thanks.

Thiago Oliveira
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] target-i386: Do not set MCG_SER_P by default

2015-11-20 Thread Borislav Petkov

On Sat, Nov 21, 2015 at 12:11:35AM +0100, Andreas Färber wrote:
> Hi,
> 
> CC'ing qemu-devel.

Ah, thanks.

> Am 21.11.2015 um 00:01 schrieb Borislav Petkov:
> > From: Borislav Petkov 
> > 
> > Software Error Recovery, i.e. SER, is purely an Intel feature and it
> > shouldn't be set by default. Enable it only on Intel.
> 
> Is this new in 2.5? Otherwise we would probably need compatibility code
> in pc*.[ch] for incoming live migration from older versions.

It looks it is really old, AFAIK from 2010:

c0532a76b407 ("MCE: Relay UCR MCE to guest")

You'd need to be more verbose about pc*.[ch]. An example perhaps...?

> >  mcg_cap &= MCE_CAP_DEF;
> >  mcg_cap |= banks;
> > +
> > +   if (IS_INTEL_CPU(env))
> > +   mcg_cap |= MCG_SER_P;
> 
> Tabs and missing braces.

Ok.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] target-i386: Do not set MCG_SER_P by default

2015-11-20 Thread Andreas Färber

Hi,

CC'ing qemu-devel.

Am 21.11.2015 um 00:01 schrieb Borislav Petkov:
> From: Borislav Petkov 
> 
> Software Error Recovery, i.e. SER, is purely an Intel feature and it
> shouldn't be set by default. Enable it only on Intel.

Is this new in 2.5? Otherwise we would probably need compatibility code
in pc*.[ch] for incoming live migration from older versions.

> 
> Signed-off-by: Borislav Petkov 
> ---
>  target-i386/cpu.c | 7 ---
>  target-i386/cpu.h | 9 -
>  target-i386/kvm.c | 5 +
>  3 files changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 11e5e39a756a..8155ee94fbe1 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2803,13 +2803,6 @@ static void x86_cpu_apic_realize(X86CPU *cpu, Error 
> **errp)
>  }
>  #endif
>  
> -
> -#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \
> -   (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \
> -   (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
> -#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
> - (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
> - (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
>  static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>  {
>  CPUState *cs = CPU(dev);
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index fc4a605d6a29..2605c564239a 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -283,7 +283,7 @@
>  #define MCG_CTL_P   (1ULL<<8)   /* MCG_CAP register available */
>  #define MCG_SER_P   (1ULL<<24) /* MCA recovery/new status bits */
>  
> -#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
> +#define MCE_CAP_DEF MCG_CTL_P
>  #define MCE_BANKS_DEF   10
>  
>  #define MCG_STATUS_RIPV (1ULL<<0)   /* restart ip valid */
> @@ -610,6 +610,13 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
>  #define CPUID_MWAIT_IBE (1U << 1) /* Interrupts can exit capability */
>  #define CPUID_MWAIT_EMX (1U << 0) /* enumeration supported */
>  
> +#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \
> +   (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \
> +   (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
> +#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
> + (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
> + (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
> +
>  #ifndef HYPERV_SPINLOCK_NEVER_RETRY
>  #define HYPERV_SPINLOCK_NEVER_RETRY 0x
>  #endif
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 2a9953b2d4b5..082d38d4838d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -787,8 +787,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  if (banks > MCE_BANKS_DEF) {
>  banks = MCE_BANKS_DEF;
>  }
> +
>  mcg_cap &= MCE_CAP_DEF;
>  mcg_cap |= banks;
> +
> + if (IS_INTEL_CPU(env))
> + mcg_cap |= MCG_SER_P;

Tabs and missing braces.

> +
>  ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, &mcg_cap);
>  if (ret < 0) {
>  fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] target-i386: Do not set MCG_SER_P by default

2015-11-20 Thread Borislav Petkov

From: Borislav Petkov 

Software Error Recovery, i.e. SER, is purely an Intel feature and it
shouldn't be set by default. Enable it only on Intel.

Signed-off-by: Borislav Petkov 
---
 target-i386/cpu.c | 7 ---
 target-i386/cpu.h | 9 -
 target-i386/kvm.c | 5 +
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 11e5e39a756a..8155ee94fbe1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2803,13 +2803,6 @@ static void x86_cpu_apic_realize(X86CPU *cpu, Error 
**errp)
 }
 #endif
 
-
-#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \
-   (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \
-   (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
-#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
- (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
- (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
 CPUState *cs = CPU(dev);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index fc4a605d6a29..2605c564239a 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -283,7 +283,7 @@
 #define MCG_CTL_P   (1ULL<<8)   /* MCG_CAP register available */
 #define MCG_SER_P   (1ULL<<24) /* MCA recovery/new status bits */
 
-#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
+#define MCE_CAP_DEF MCG_CTL_P
 #define MCE_BANKS_DEF   10
 
 #define MCG_STATUS_RIPV (1ULL<<0)   /* restart ip valid */
@@ -610,6 +610,13 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_MWAIT_IBE (1U << 1) /* Interrupts can exit capability */
 #define CPUID_MWAIT_EMX (1U << 0) /* enumeration supported */
 
+#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \
+   (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \
+   (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
+#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
+ (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
+ (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
+
 #ifndef HYPERV_SPINLOCK_NEVER_RETRY
 #define HYPERV_SPINLOCK_NEVER_RETRY 0x
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2a9953b2d4b5..082d38d4838d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -787,8 +787,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (banks > MCE_BANKS_DEF) {
 banks = MCE_BANKS_DEF;
 }
+
 mcg_cap &= MCE_CAP_DEF;
 mcg_cap |= banks;
+
+   if (IS_INTEL_CPU(env))
+   mcg_cap |= MCG_SER_P;
+
 ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, &mcg_cap);
 if (ret < 0) {
 fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: Dump guest rIP when the guest tried something unsupported

2015-11-20 Thread Borislav Petkov

From: Borislav Petkov 

It looks like this in action:

  kvm [5197]: vcpu0, guest rIP: 0x810187ba unhandled rdmsr: 0xc001102

and helps to pinpoint quickly where in the guest we did the unsupported
thing.

Signed-off-by: Borislav Petkov 
---
 include/linux/kvm_host.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5706a2108f0a..597f6607c440 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -439,7 +439,8 @@ struct kvm {
 
 /* The guest did something we don't support. */
 #define vcpu_unimpl(vcpu, fmt, ...)\
-   kvm_pr_unimpl("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+   kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt,  \
+   (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__)
 
 #define vcpu_debug(vcpu, fmt, ...) \
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio_ring: Shadow available ring flags & index

2015-11-20 Thread Venkatesh Srinivas

On Thu, Nov 19, 2015 at 04:15:48PM +, Xie, Huawei wrote:
> On 11/18/2015 12:28 PM, Venkatesh Srinivas wrote:
> > On Tue, Nov 17, 2015 at 08:08:18PM -0800, Venkatesh Srinivas wrote:
> >> On Mon, Nov 16, 2015 at 7:46 PM, Xie, Huawei  wrote:
> >>
> >>> On 11/14/2015 7:41 AM, Venkatesh Srinivas wrote:
>  On Wed, Nov 11, 2015 at 02:34:33PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 10, 2015 at 04:21:07PM -0800, Venkatesh Srinivas wrote:
> >> Improves cacheline transfer flow of available ring header.
> >>
> >> Virtqueues are implemented as a pair of rings, one producer->consumer
> >> avail ring and one consumer->producer used ring; preceding the
> >> avail ring in memory are two contiguous u16 fields -- avail->flags
> >> and avail->idx. A producer posts work by writing to avail->idx and
> >> a consumer reads avail->idx.
> >>
> >> The flags and idx fields only need to be written by a producer CPU
> >> and only read by a consumer CPU; when the producer and consumer are
> >> running on different CPUs and the virtio_ring code is structured to
> >> only have source writes/sink reads, we can continuously transfer the
> >> avail header cacheline between 'M' states between cores. This flow
> >> optimizes core -> core bandwidth on certain CPUs.
> >>
> >> (see: "Software Optimization Guide for AMD Family 15h Processors",
> >> Section 11.6; similar language appears in the 10h guide and should
> >> apply to CPUs w/ exclusive caches, using LLC as a transfer cache)
> >>
> >> Unfortunately the existing virtio_ring code issued reads to the
> >> avail->idx and read-modify-writes to avail->flags on the producer.
> >>
> >> This change shadows the flags and index fields in producer memory;
> >> the vring code now reads from the shadows and only ever writes to
> >> avail->flags and avail->idx, allowing the cacheline to transfer
> >> core -> core optimally.
> > Sounds logical, I'll apply this after a  bit of testing
> > of my own, thanks!
>  Thanks!
> >>> Venkatesh:
> >>> Is it that your patch only applies to CPUs w/ exclusive caches?
> >> No --- it applies when the inter-cache coherence flow is optimized by
> >> 'M' -> 'M' transfers and when producer reads might interfere w/
> >> consumer prefetchw/reads. The AMD Optimization guides have specific
> >> language on this subject, but other platforms may benefit.
> >> (see Intel #'s below)
> For core2core case(not HT paire), after consumer reads that M cache line
> for avail_idx, is that line still in the producer core's L1 data cache
> with state changing from M->O state?

Textbook MOESI would not allow that state combination -- when the consumer
gets the line in 'M' state, the producer cannot hold it in 'O' state.

On the AMD Piledriver, per the Optimization guide, I use PREFETCHW/Load to
get the line in 'M' state on the consumer (invalidating it in the Producer's
cache):

"* Use PREFETCHW on the consumer side, even if the consumer will not modify
   the data"

That, plus the "Optimizing Inter-Core Data Transfer" section imply that
PREFETCHW + MOV will cause the consumer to load the line into 'M' state.

PREFETCHW was not available on Intel CPUs pre-Broadwell; from the public
documentation alone, I don't think we can tell what transition the producer's
cacheline undergoes on these cores. For that matter, the latest documentation
I can find (for Nehalem), indicated there was no 'O' state -- Nehalem
implemented MESIF, not MOESI.

HTH,
-- vs;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-unit-tests PATCH 00/18] bunch of mostly trivial patches

2015-11-20 Thread Andrew Jones

On Tue, Nov 10, 2015 at 11:54:22AM -0500, Andrew Jones wrote:
> On Tue, Nov 10, 2015 at 05:38:38PM +0100, Paolo Bonzini wrote:
> > 
> > 
> > On 06/11/2015 01:24, Andrew Jones wrote:
> > > Many of these patches were posted once. Some weren't, but anyway
> > > almost everything is pretty trivial. I'd like to get these in, or
> > > at least get definitive nacks on them (and then drop them) in order
> > > to clean my queue before more patches (coming from Alex Bennée and
> > > Chistopher are reposted).
> > > 
> > > All patches also available here
> > > https://github.com/rhdrjones/kvm-unit-tests/commits/queue
> > 
> > I applied all of these 
> 
> Thanks!
> 
> > except 1 (question asked) and 14/15/16/17 (not sure I like the idea).

Hi Paolo,

Any more thoughts on these? I parsed "not sure I like" as "still
thinking". Or should I parse it as a "no" and drop them from my
queue?

> 
> At one point I recall that you liked the uapi patches, although I'm
> not 100% married to it myself, as it does add a new dependency. I'm
> open to suggestions.

Another argument for the uapi patches is that we're working on adding
support for the mach-virt pcie host bridge in order to use pci-testdev
in arm unit tests. We'll need to either use this series or import 
pci[_regs].h for that.

> 
> I'm not sure what you're opposed to wrt to map files (patch 15). They
> aren't 100% necessary, but don't really hurt either to generate either.
> I won't fight for them though.

I'm OK with dropping this one. The map files were useful to me once,
but as rare as they would be, I agree cluttering things with them
isn't a great idea.

> 
> The TEST= patch is quite useful. I find it annoying to always have
> to modify a makefile whenever I throw together a few line test. It
> may not be for everyone, but then it doesn't do anything when it's
> not used, so it shouldn't hurt that it exists. I would agree that
> maybe the patch should also document it though, if you argued that.
> Or, that fact that it's undocumented, and does nothing when not used,
> could be an argument to just commit it :-)

I still like this one. I'll buy you a beer for it :-)

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-11-20 Thread Marc Zyngier

On 20/11/15 16:48, Steve Capper wrote:
> On Mon, Nov 16, 2015 at 01:11:42PM +, Marc Zyngier wrote:
>> Implement the vgic-v3 save restore as a direct translation of
>> the assembly code version.
> 
> I think there's a couple of typos below Marc.

[...]

>> +case 10:
>> +cpu_if->vgic_lr[LR_OFFSET(19)] = read_gicreg(ICH_LR10_EL2);
> 
> LR_OFFSET(10) ?

Memory corruption, here we come ;-)

Thanks, I'll fix those!

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-11-20 Thread Steve Capper

On Mon, Nov 16, 2015 at 01:11:42PM +, Marc Zyngier wrote:
> Implement the vgic-v3 save restore as a direct translation of
> the assembly code version.

I think there's a couple of typos below Marc.

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/hyp.h|   3 +
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
> 
>  3 files changed, 226 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d8d5968..d1e38ce 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,3 +3,4 @@
>  #
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 78f25c4..a31cb6e 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -30,5 +30,8 @@
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> new file mode 100644
> index 000..f2289ab
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -0,0 +1,222 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/*
> + * We store LRs in reverse order to let the CPU deal with streaming
> + * access. Use this macro to make it look saner...
> + */
> +#define LR_OFFSET(n) (15 - n)
> +
> +#define read_gicreg(r)   
> \
> + ({  \
> + u64 reg;\
> + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
> + reg;\
> + })
> +
> +#define write_gicreg(v,r)\
> + do {\
> + u64 __val = (v);\
> + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
> + } while (0)
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
> + u64 val;
> + u32 nr_lr, nr_pri;
> +
> + /*
> +  * Make sure stores to the GIC via the memory mapped interface
> +  * are now visible to the system register interface.
> +  */
> + dsb(st);
> +
> + cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> + cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> + cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> + cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> +
> + write_gicreg(0, ICH_HCR_EL2);
> + val = read_gicreg(ICH_VTR_EL2);
> + nr_lr = val & 0xf;
> + nr_pri = ((u32)val >> 29) + 1;
> +
> + switch (nr_lr) {
> + case 15:
> + cpu_if->vgic_lr[LR_OFFSET(15)] = read_gicreg(ICH_LR15_EL2);
> + case 14:
> + cpu_if->vgic_lr[LR_OFFSET(14)] = read_gicreg(ICH_LR14_EL2);
> + case 13:
> + cpu_if->vgic_lr[LR_OFFSET(13)] = read_gicreg(ICH_LR13_EL2);
> + case 12:
> + cpu_if->vgic_lr[LR_OFFSET(12)] = read_gicreg(ICH_LR12_EL2);
> + case 11:
> + cpu_if->vgic_lr[LR_OFFSET(11)] = read_gicreg(ICH_LR11_EL2);
> + case 10:
> + cpu_if->vgic_lr[LR_OFFSET(19)] = read_gicreg(ICH_LR10_EL2);

LR_OFFSET(10) ?

> + case 9:
> + cpu_if->vgic_lr[LR_OFFSET(9)] = read_gicreg(ICH_LR9_EL2);
> + case 8:
> + cpu_if->vgic_lr[LR_OFFSET(8)] = read_gicreg(ICH_LR8_EL2);
> + case 7:
> + cpu_if->vgic_lr[LR_OFFSET(7)] = read_gicreg(ICH_LR7_EL2);
> + case 6:
> + cpu_if->vgic_lr[LR_OFFSET(6)] = read_gicreg(ICH_LR6_EL2);
> + case 5:
> + cpu_if->vgic_lr[LR_OFFSET(5)] = read_gicreg(ICH_LR5_EL2);
> + case 4:
> +

Re: [PATCH v9 6/6] tests/guest-debug: introduce basic gdbstub tests

2015-11-20 Thread Peter Maydell

On 12 November 2015 at 16:20, Alex Bennée  wrote:
> From: Alex Bennée 
>
> The aim of these tests is to combine with an appropriate kernel
> image (with symbol-file vmlinux) and check it behaves as it should.
> Given a kernel it checks:
>
>   - single step
>   - software breakpoint
>   - hardware breakpoint
>   - access, read and write watchpoints
>
> On success it returns 0 to the calling process.
>
> I've not plumbed this into the "make check" logic though as we need a
> solution for providing non-host binaries to the tests. However the test
> is structured to work with pretty much any Linux kernel image as it
> uses the basic kernel_init code which is common across architectures.

Do these tests pass if you run them on the TCG QEMU, just out
of interest?

I'm not a great fan of tests that aren't in 'make check'
because IME they just bitrot, but as you say we have no
sensible approach for handling tests that need to run real
guest code :-(

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 5/6] target-arm: kvm - re-inject guest debug exceptions

2015-11-20 Thread Peter Maydell

On 12 November 2015 at 16:20, Alex Bennée  wrote:
> From: Alex Bennée 
>
> If we can't find details for the debug exception in our debug state
> then we can assume the exception is due to debugging inside the guest.
> To inject the exception into the guest state we re-use the TCG exception
> code (do_interupt).

"do_interrupt".

>
> However while guest debugging is in effect we currently can't handle the
> guest using single step which is heavily used by GDB.

Can you expand this to be clearer about what the problem is here?
Is this a thing fixed by this commit or a remaining issue after it?

> Signed-off-by: Alex Bennée 
>
> ---
> v5:
>   - new for v5
> ---
>  target-arm/helper-a64.c | 12 ++--
>  target-arm/kvm.c| 27 +++
>  2 files changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index deb8dbe..fc3ccdf 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -25,6 +25,7 @@
>  #include "qemu/bitops.h"
>  #include "internals.h"
>  #include "qemu/crc32c.h"
> +#include "sysemu/kvm.h"
>  #include  /* For crc32 */
>
>  /* C2.4.7 Multiply and divide */
> @@ -469,7 +470,8 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>new_el);
>  if (qemu_loglevel_mask(CPU_LOG_INT)
>  && !excp_is_internal(cs->exception_index)) {
> -qemu_log_mask(CPU_LOG_INT, "...with ESR 0x%" PRIx32 "\n",
> +qemu_log_mask(CPU_LOG_INT, "...with ESR %x/0x%" PRIx32 "\n",
> +  env->exception.syndrome >> ARM_EL_EC_SHIFT,
>env->exception.syndrome);
>  }
>
> @@ -535,6 +537,12 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  aarch64_restore_sp(env, new_el);
>
>  env->pc = addr;
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +
> +qemu_log_mask(CPU_LOG_INT, "...to EL%d PC 0x%" PRIx64 " PSTATE 0x%x\n",
> +  new_el, env->pc, pstate_read(env));
> +
> +if (!kvm_enabled()) {
> +cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +}
>  }
>  #endif
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 1f57e92..4ac177a 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -529,9 +529,10 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  struct kvm_debug_exit_arch *arch_info = &run->debug.arch;
>  int hsr_ec = arch_info->hsr >> ARM_EL_EC_SHIFT;
>  ARMCPU *cpu = ARM_CPU(cs);
> +CPUClass *cc = CPU_GET_CLASS(cs);
>  CPUARMState *env = &cpu->env;
>
> -/* Ensure PC is synchronised */
> +/* Ensure all state is synchronised */

You might as well have just written the comment like that to start with :-)

>  kvm_cpu_synchronize_state(cs);
>
>  switch (hsr_ec) {
> @@ -539,7 +540,14 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  if (cs->singlestep_enabled) {
>  return true;
>  } else {
> -error_report("Came out of SINGLE STEP when not enabled");
> +/*
> + * The kernel should have supressed the guests ability to

"suppressed". "guest's".

> + * single step at this point so something has gone wrong.
> + */
> +error_report("%s: guest single-step while debugging unsupported"
> + " (%"PRIx64", %"PRIx32")\n",
> + __func__, env->pc, arch_info->hsr);
> +return false;

Why didn't we just write the error_report this way to start with?

>  }
>  break;
>  case EC_AA64_BKPT:
> @@ -564,14 +572,17 @@ static int kvm_handle_debug(CPUState *cs, struct 
> kvm_run *run)
>  default:
>  error_report("%s: unhandled debug exit (%"PRIx32", %"PRIx64")\n",
>   __func__, arch_info->hsr, env->pc);
> +return false;

Might as well put this "return false;" in the original code that
adds the default case, rather than changing the control flow in this
patch.

>  }
>
> -/* If we don't handle this it could be it really is for the
> -   guest to handle */
> -qemu_log_mask(LOG_UNIMP,
> -  "%s: re-injecting exception not yet implemented"
> -  " (0x%"PRIx32", %"PRIx64")\n",
> -  __func__, hsr_ec, env->pc);
> +/* If we are not handling the debug exception it must belong to
> + * the guest. Let's re-use the existing TCG interrupt code to set
> + * everything up properly

Missing trailing ".".

> + */
> +cs->exception_index = EXCP_BKPT;
> +env->exception.syndrome = arch_info->hsr;
> +env->exception.vaddress = arch_info->far;

You need to set env->exception.target_el to 1 as well.

> +cc->do_interrupt(cs);
>
>  return false;
>  }
> --
> 2.6.3
>

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 00/10] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-20 Thread Paolo Bonzini



On 20/11/2015 09:40, Takuya Yoshikawa wrote:
>   About patch 03: There was a comment on the usage of braces for a single line
>   else-if statement from Xiao. As I answered, checkpatch did not complain 
> about
>   this, and when the corresponding if block has multiple lines, some 
> developers
>   prefer/recommend this style. Feel free to modify it if you don't like it.

I wonder if we can remove the "else if" completely.  Will take a look
after applying, early next week.

> For these three, I'm not sure what we should do now, still RFC?
> We can also consider other approaches, e.g. moving link_shadow_page() in the
> kvm_get_mmu_page() as Paolo suggested before.

I think I will include them in kvm/next.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/21] arm64: KVM: Implement vgic-v2 save/restore

2015-11-20 Thread Marc Zyngier

Hi Steve,

On 20/11/15 15:22, Steve Capper wrote:
> On Mon, Nov 16, 2015 at 01:11:41PM +, Marc Zyngier wrote:
>> > Implement the vgic-v2 save restore as a direct translation of
>> > the assembly code version.
> Hi Marc,
> I have one comment below:
> 
> Cheers,
> -- Steve
>> > 
>> > Signed-off-by: Marc Zyngier 
>> > ---
>> >  arch/arm64/kvm/Makefile |  1 +
>> >  arch/arm64/kvm/hyp/Makefile |  5 +++
>> >  arch/arm64/kvm/hyp/hyp.h|  3 ++
>> >  arch/arm64/kvm/hyp/vgic-v2-sr.c | 85 
>> > +
>> >  4 files changed, 94 insertions(+)
>> >  create mode 100644 arch/arm64/kvm/hyp/Makefile
>> >  create mode 100644 arch/arm64/kvm/hyp/vgic-v2-sr.c
>> > 
>> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> > index 1949fe5..d31e4e5 100644
>> > --- a/arch/arm64/kvm/Makefile
>> > +++ b/arch/arm64/kvm/Makefile
>> > @@ -10,6 +10,7 @@ KVM=../../../virt/kvm
>> >  ARM=../../../arch/arm/kvm
>> >  
>> >  obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
>> > +obj-$(CONFIG_KVM_ARM_HOST) += hyp/
>> >  
>> >  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
>> > $(KVM)/eventfd.o $(KVM)/vfio.o
>> >  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
>> > diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> > new file mode 100644
>> > index 000..d8d5968
>> > --- /dev/null
>> > +++ b/arch/arm64/kvm/hyp/Makefile
>> > @@ -0,0 +1,5 @@
>> > +#
>> > +# Makefile for Kernel-based Virtual Machine module, HYP part
>> > +#
>> > +
>> > +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>> > diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> > index dac843e..78f25c4 100644
>> > --- a/arch/arm64/kvm/hyp/hyp.h
>> > +++ b/arch/arm64/kvm/hyp/hyp.h
>> > @@ -27,5 +27,8 @@
>> >  
>> >  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
>> > HYP_PAGE_OFFSET_MASK)
>> >  
>> > +void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>> > +void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>> > +
>> >  #endif /* __ARM64_KVM_HYP_H__ */
>> >  
>> > diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c 
>> > b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> > new file mode 100644
>> > index 000..1382d2e
>> > --- /dev/null
>> > +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> > @@ -0,0 +1,85 @@
>> > +/*
>> > + * Copyright (C) 2012-2015 - ARM Ltd
>> > + * Author: Marc Zyngier 
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License
>> > + * along with this program.  If not, see .
>> > + */
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#include 
>> > +
>> > +#include "hyp.h"
>> > +
>> > +/* vcpu is already in the HYP VA space */
>> > +void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>> > +{
>> > +  struct kvm *kvm = kern_hyp_va(vcpu->kvm);
>> > +  struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
>> > +  struct vgic_dist *vgic = &kvm->arch.vgic;
>> > +  void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>> > +  u32 __iomem *lr_base;
>> > +  u32 eisr0, eisr1, elrsr0, elrsr1;
>> > +  int i = 0, nr_lr;
>> > +
>> > +  if (!base)
>> > +  return;
>> > +
>> > +  cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
>> > +  cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>> > +  eisr0  = readl_relaxed(base + GICH_EISR0);
>> > +  eisr1  = readl_relaxed(base + GICH_EISR0);
> Not sure what this would affect, but should the RHS be:
> "base + GICH_EISR1"?

-ECOPYPASTE. We're just lucky that we only have 4 list registers, and
thus never look past the first word.

Thanks a lot for noticing this!

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 4/6] target-arm: kvm - add support for HW assisted debug

2015-11-20 Thread Peter Maydell

On 12 November 2015 at 16:20, Alex Bennée  wrote:
> This adds basic support for HW assisted debug. The ioctl interface to
> KVM allows us to pass an implementation defined number of break and
> watch point registers. When KVM_GUESTDBG_USE_HW is specified these
> debug registers will be installed in place on the world switch into the
> guest.
>
> The hardware is actually capable of more advanced matching but it is
> unclear if this expressiveness is available via the gdbstub protocol.
>
> Signed-off-by: Alex Bennée 
>
> ---
> v2
>   - correct setting of PMC/BAS/MASK
>   - improved commentary
>   - added helper function to check watchpoint in range
>   - fix find/deletion of watchpoints
> v3
>   - use internals.h definitions
> v6
>   - KVM_GUESTDBG_USE_HW_BP->KVM_GUESTDBG_USE_HW
>   - renamed some helper functions to avoid confusion
> v9
>   - fix merge conflicts on re-base
>   - rm asm/ptrace.h include
>   - add additional commentry for hw breakpoints
>   - explain gdb's model for HW bkpts
>   - fix up spacing, formatting as per checkpatch
>   - better PAC values
>   - use is_power_of_2
>   - use _arm_ fn naming and add docs
>   - add a CPUWatchpoint structure for reporting
>   - replace manual array manipulation with g_array abstraction
> ---
>  target-arm/kvm.c |  38 +++---
>  target-arm/kvm64.c   | 352 
> ++-
>  target-arm/kvm_arm.h |  38 ++
>  3 files changed, 406 insertions(+), 22 deletions(-)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index d505a7e..1f57e92 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -547,6 +547,20 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  return true;
>  }
>  break;
> +case EC_BREAKPOINT:
> +if (kvm_arm_find_hw_breakpoint(cs, env->pc)) {
> +return true;
> +}
> +break;
> +case EC_WATCHPOINT:
> +{
> +CPUWatchpoint *wp = kvm_arm_find_hw_watchpoint(cs, arch_info->far);

This won't compile for 32-bit ARM.

> +if (wp) {
> +cs->watchpoint_hit = wp;
> +return true;
> +}
> +break;
> +}
>  default:
>  error_report("%s: unhandled debug exit (%"PRIx32", %"PRIx64")\n",
>   __func__, arch_info->hsr, env->pc);
> @@ -608,6 +622,10 @@ void kvm_arch_update_guest_debug(CPUState *cs, struct 
> kvm_guest_debug *dbg)
>  if (kvm_sw_breakpoints_active(cs)) {
>  dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
>  }
> +if (kvm_arm_hw_debug_active(cs)) {
> +dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW;
> +kvm_arm_copy_hw_debug_data(&dbg->arch);
> +}
>  }
>
>  /* C6.6.29 BRK instruction */
> @@ -635,26 +653,6 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct 
> kvm_sw_breakpoint *bp)
>  return 0;
>  }
>
> -int kvm_arch_insert_hw_breakpoint(target_ulong addr,
> -  target_ulong len, int type)
> -{
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return -EINVAL;
> -}

You've moved these functions into kvm64.c but haven't provided
a stub version in kvm32.c...

> -
> -int kvm_arch_remove_hw_breakpoint(target_ulong addr,
> -  target_ulong len, int type)
> -{
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return -EINVAL;
> -}
> -
> -
> -void kvm_arch_remove_all_hw_breakpoints(void)
> -{
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -}
> -
>  void kvm_arch_init_irq_routing(KVMState *s)
>  {
>  }
> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
> index d087794..c468324 100644
> --- a/target-arm/kvm64.c
> +++ b/target-arm/kvm64.c
> @@ -2,6 +2,7 @@
>   * ARM implementation of KVM hooks, 64 bit specific code
>   *
>   * Copyright Mian-M. Hamayun 2013, Virtual Open Systems
> + * Copyright Alex Bennée 2014, Linaro
>   *
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
> @@ -12,12 +13,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
> +#include 
>  #include 
>
>  #include "config-host.h"
>  #include "qemu-common.h"
>  #include "qemu/timer.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/error-report.h"
> +#include "exec/gdbstub.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_arm.h"
> @@ -27,20 +33,362 @@
>
>  static bool have_guest_debug;
>
> +/*
> + * Although the ARM implementation of hardware assisted debugging
> + * allows for different breakpoints per-core the current GDB interface

Comma after "per-core".

> + * treats them as a global pool of registers (which seems to be the
> + * case for x86, ppc and s390). As a result we store one copy of
> + * registers which is used for all active cores.
> + *
> + * Write access is serialised by virtue of the GDB protocol which
> + * updates things. Rea

Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-11-20 Thread Laurent Vivier



On 20/11/2015 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)
> 

Nice catch.

Reviewed-by: Laurent Vivier 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 3/6] target-arm: kvm - support for single step

2015-11-20 Thread Peter Maydell

On 12 November 2015 at 16:20, Alex Bennée  wrote:
> This adds support for single-step. There isn't much to do on the QEMU
> side as after we set-up the request for single step via the debug ioctl
> it is all handled within the kernel.
>
> Signed-off-by: Alex Bennée 
>
> ---
> v2
>   - convert to using HSR_EC
> v3
>   - use internals.h definitions
> ---
>  target-arm/kvm.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 50f70ef..d505a7e 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -535,6 +535,13 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  kvm_cpu_synchronize_state(cs);
>
>  switch (hsr_ec) {
> +case EC_SOFTWARESTEP:
> +if (cs->singlestep_enabled) {
> +return true;
> +} else {
> +error_report("Came out of SINGLE STEP when not enabled");
> +}
> +break;
>  case EC_AA64_BKPT:
>  if (kvm_find_sw_breakpoint(cs, env->pc)) {
>  return true;
> @@ -595,6 +602,9 @@ int kvm_arch_on_sigbus(int code, void *addr)
>
>  void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
>  {
> +if (cs->singlestep_enabled) {
> +dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
> +}

Doesn't kvm_update_guest_debug() already set these bits, or am
I misreading it?

>  if (kvm_sw_breakpoints_active(cs)) {
>  dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
>  }
> --
> 2.6.3

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 2/6] target-arm: kvm - implement software breakpoints

2015-11-20 Thread Peter Maydell

On 12 November 2015 at 16:20, Alex Bennée  wrote:
> These don't involve messing around with debug registers, just setting
> the breakpoint instruction in memory. GDB will not use this mechanism if
> it can't access the memory to write the breakpoint.
>
> All the kernel has to do is ensure the hypervisor traps the breakpoint
> exceptions and returns to userspace.
>
> Signed-off-by: Alex Bennée 
>
> --
> v2
>   - handle debug exit with new hsr exception info
>   - add verbosity to UNIMP message
> v3
>   - sync with kvm_cpu_synchronize_state() before checking PC.
>   - use internals.h defines
>   - use env->pc
>   - use proper format types
> v9
>   - add include for error_report
>   - define a brk_insn constant
> ---
>  target-arm/kvm.c | 90 
> 
>  1 file changed, 78 insertions(+), 12 deletions(-)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 79ef4c6..50f70ef 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -17,6 +17,7 @@
>
>  #include "qemu-common.h"
>  #include "qemu/timer.h"
> +#include "qemu/error-report.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_arm.h"
> @@ -516,9 +517,60 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct 
> kvm_run *run)
>  return MEMTXATTRS_UNSPECIFIED;
>  }
>
> +/* See v8 ARM ARM D7.2.27 ESR_ELx, Exception Syndrome Register
> + *
> + * To minimise translating between kernel and user-space the kernel
> + * ABI just provides user-space with the full exception syndrome
> + * register value to be decoded in QEMU.
> + */
> +
> +static int kvm_handle_debug(CPUState *cs, struct kvm_run *run)
> +{
> +struct kvm_debug_exit_arch *arch_info = &run->debug.arch;
> +int hsr_ec = arch_info->hsr >> ARM_EL_EC_SHIFT;

This doesn't build for 32-bit ARM:

target-arm/kvm.c:530:27: error: ‘struct kvm_debug_exit_arch’ has no
member named ‘hsr’
 int hsr_ec = arch_info->hsr >> ARM_EL_EC_SHIFT;

> +ARMCPU *cpu = ARM_CPU(cs);
> +CPUARMState *env = &cpu->env;
> +
> +/* Ensure PC is synchronised */
> +kvm_cpu_synchronize_state(cs);
> +
> +switch (hsr_ec) {
> +case EC_AA64_BKPT:
> +if (kvm_find_sw_breakpoint(cs, env->pc)) {
> +return true;
> +}
> +break;
> +default:
> +error_report("%s: unhandled debug exit (%"PRIx32", %"PRIx64")\n",
> + __func__, arch_info->hsr, env->pc);
> +}
> +
> +/* If we don't handle this it could be it really is for the
> +   guest to handle */
> +qemu_log_mask(LOG_UNIMP,
> +  "%s: re-injecting exception not yet implemented"
> +  " (0x%"PRIx32", %"PRIx64")\n",
> +  __func__, hsr_ec, env->pc);
> +
> +return false;
> +}
> +
>  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>  {
> -return 0;
> +int ret = 0;
> +
> +switch (run->exit_reason) {
> +case KVM_EXIT_DEBUG:
> +if (kvm_handle_debug(cs, run)) {
> +ret = EXCP_DEBUG;
> +} /* otherwise return to guest */
> +break;
> +default:
> +qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> +  __func__, run->exit_reason);
> +break;
> +}
> +return ret;
>  }
>
>  bool kvm_arch_stop_on_emulation_error(CPUState *cs)
> @@ -543,14 +595,34 @@ int kvm_arch_on_sigbus(int code, void *addr)
>
>  void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
>  {
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> +if (kvm_sw_breakpoints_active(cs)) {
> +dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;

This won't compile on 32-bit ARM either:

target-arm/kvm.c:634:38: error: ‘KVM_GUESTDBG_USE_SW_BP’ undeclared
(first use in this function)
 dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;

> +}
>  }
>
> -int kvm_arch_insert_sw_breakpoint(CPUState *cs,
> -  struct kvm_sw_breakpoint *bp)
> +/* C6.6.29 BRK instruction */
> +static const uint32_t brk_insn = 0xd420;

This is the A64 breakpoint instruction, so why is it in the common-to-32-and-64
source file? How about the A32 and T16 breakpoint insns?

> +
> +int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
>  {
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return -EINVAL;
> +
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 4, 0) ||
> +cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&brk_insn, 4, 1)) {
> +return -EINVAL;
> +}

Should we allow insertion of sw breakpoint insns if the kernel doesn't
implement KVM_EXIT_DEBUG and reporting the ESR to us?

> +return 0;
> +}
> +
> +int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
> +{
> +static uint32_t brk;
> +
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&brk, 4, 0) ||
> +brk != brk_insn ||
> +

Re: [PATCH v9 1/6] target-arm: kvm64 - introduce kvm_arm_init_debug()

2015-11-20 Thread Alex Bennée


Peter Maydell  writes:

> On 20 November 2015 at 15:05, Peter Maydell  wrote:
>> On 12 November 2015 at 16:20, Alex Bennée  wrote:
>>> As we haven't always had guest debug support we need to probe for it.
>>> Additionally we don't do this in the start-up capability code so we
>>> don't fall over on old kernels.
>>>
>>> Signed-off-by: Alex Bennée 
>>> ---
>>>  target-arm/kvm64.c | 18 ++
>>>  1 file changed, 18 insertions(+)
>>>
>>> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
>>> index ceebfeb..d087794 100644
>>> --- a/target-arm/kvm64.c
>>> +++ b/target-arm/kvm64.c
>>> @@ -25,6 +25,22 @@
>>>  #include "internals.h"
>>>  #include "hw/arm/arm.h"
>>>
>>> +static bool have_guest_debug;
>>> +
>>> +/**
>>> + * kvm_arm_init_debug()
>>> + * @cs: CPUState
>>> + *
>>> + * Check for guest debug capabilities.
>>> + *
>>> + */
>>> +static void kvm_arm_init_debug(CPUState *cs)
>>> +{
>>> +have_guest_debug = kvm_check_extension(cs->kvm_state,
>>> +   KVM_CAP_SET_GUEST_DEBUG);
>>> +return;
>>> +}
>>> +
>>>  static inline void set_feature(uint64_t *features, int feature)
>>>  {
>>>  *features |= 1ULL << feature;
>>> @@ -121,6 +137,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>  }
>>>  cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
>>>
>>> +kvm_arm_init_debug(cs);
>>> +
>>>  return kvm_arm_init_cpreg_list(cpu);
>>>  }
>>
>> I assume in practice the kernel guarantees that either all
>> CPUs have the SET_GUEST_DEBUG cap, or none do :-)
>>
>> Reviewed-by: Peter Maydell 
>
> ...except I've just noticed that nothing else in this patchset
> ever reads the have_guest_debug bool we just set, so what is
> the purpose of this patch?

Oops, maybe to point out my stupidity ;-)

But yes the SET_GUEST_DEBUG cap is kernel wide.

>
> thanks
> -- PMM


-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/21] arm64: KVM: Implement vgic-v2 save/restore

2015-11-20 Thread Steve Capper

On Mon, Nov 16, 2015 at 01:11:41PM +, Marc Zyngier wrote:
> Implement the vgic-v2 save restore as a direct translation of
> the assembly code version.

Hi Marc,
I have one comment below:

Cheers,
-- 
Steve


> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/Makefile |  1 +
>  arch/arm64/kvm/hyp/Makefile |  5 +++
>  arch/arm64/kvm/hyp/hyp.h|  3 ++
>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 85 
> +
>  4 files changed, 94 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/Makefile
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v2-sr.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 1949fe5..d31e4e5 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -10,6 +10,7 @@ KVM=../../../virt/kvm
>  ARM=../../../arch/arm/kvm
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
> +obj-$(CONFIG_KVM_ARM_HOST) += hyp/
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> $(KVM)/eventfd.o $(KVM)/vfio.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> new file mode 100644
> index 000..d8d5968
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for Kernel-based Virtual Machine module, HYP part
> +#
> +
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index dac843e..78f25c4 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -27,5 +27,8 @@
>  
>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & HYP_PAGE_OFFSET_MASK)
>  
> +void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> new file mode 100644
> index 000..1382d2e
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
> @@ -0,0 +1,85 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> + struct vgic_dist *vgic = &kvm->arch.vgic;
> + void __iomem *base = kern_hyp_va(vgic->vctrl_base);
> + u32 __iomem *lr_base;
> + u32 eisr0, eisr1, elrsr0, elrsr1;
> + int i = 0, nr_lr;
> +
> + if (!base)
> + return;
> +
> + cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
> + cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
> + eisr0  = readl_relaxed(base + GICH_EISR0);
> + eisr1  = readl_relaxed(base + GICH_EISR0);

Not sure what this would affect, but should the RHS be:
"base + GICH_EISR1"?

> + elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> + elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> + cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
> + cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> +#else
> + cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
> + cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> +#endif
> + cpu_if->vgic_apr= readl_relaxed(base + GICH_APR);
> +
> + writel_relaxed(0, base + GICH_HCR);
> +
> + lr_base = base + GICH_LR0;
> + nr_lr = vcpu->arch.vgic_cpu.nr_lr;
> + do {
> + cpu_if->vgic_lr[i++] = readl_relaxed(lr_base++);
> + } while (--nr_lr);
> +}
> +
> +void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> + struct vgic_dist *vgic = &kvm->arch.vgic;
> + void __iomem *base = kern_hyp_va(vgic->vctrl_base);
> + u32 __iomem *lr_base;
> + unsigned int i = 0, nr_lr;
> +
> + if (!base)
> + return;
> +
> + writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> + writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR);
> + writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> +
> + lr_base = base + GICH_LR0;
> + nr_lr = vcpu->arch.vgic_cp

Re: [PATCH v9 1/6] target-arm: kvm64 - introduce kvm_arm_init_debug()

2015-11-20 Thread Peter Maydell

On 20 November 2015 at 15:05, Peter Maydell  wrote:
> On 12 November 2015 at 16:20, Alex Bennée  wrote:
>> As we haven't always had guest debug support we need to probe for it.
>> Additionally we don't do this in the start-up capability code so we
>> don't fall over on old kernels.
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  target-arm/kvm64.c | 18 ++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
>> index ceebfeb..d087794 100644
>> --- a/target-arm/kvm64.c
>> +++ b/target-arm/kvm64.c
>> @@ -25,6 +25,22 @@
>>  #include "internals.h"
>>  #include "hw/arm/arm.h"
>>
>> +static bool have_guest_debug;
>> +
>> +/**
>> + * kvm_arm_init_debug()
>> + * @cs: CPUState
>> + *
>> + * Check for guest debug capabilities.
>> + *
>> + */
>> +static void kvm_arm_init_debug(CPUState *cs)
>> +{
>> +have_guest_debug = kvm_check_extension(cs->kvm_state,
>> +   KVM_CAP_SET_GUEST_DEBUG);
>> +return;
>> +}
>> +
>>  static inline void set_feature(uint64_t *features, int feature)
>>  {
>>  *features |= 1ULL << feature;
>> @@ -121,6 +137,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>  }
>>  cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
>>
>> +kvm_arm_init_debug(cs);
>> +
>>  return kvm_arm_init_cpreg_list(cpu);
>>  }
>
> I assume in practice the kernel guarantees that either all
> CPUs have the SET_GUEST_DEBUG cap, or none do :-)
>
> Reviewed-by: Peter Maydell 

...except I've just noticed that nothing else in this patchset
ever reads the have_guest_debug bool we just set, so what is
the purpose of this patch?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 1/6] target-arm: kvm64 - introduce kvm_arm_init_debug()

2015-11-20 Thread Peter Maydell

On 12 November 2015 at 16:20, Alex Bennée  wrote:
> As we haven't always had guest debug support we need to probe for it.
> Additionally we don't do this in the start-up capability code so we
> don't fall over on old kernels.
>
> Signed-off-by: Alex Bennée 
> ---
>  target-arm/kvm64.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
> index ceebfeb..d087794 100644
> --- a/target-arm/kvm64.c
> +++ b/target-arm/kvm64.c
> @@ -25,6 +25,22 @@
>  #include "internals.h"
>  #include "hw/arm/arm.h"
>
> +static bool have_guest_debug;
> +
> +/**
> + * kvm_arm_init_debug()
> + * @cs: CPUState
> + *
> + * Check for guest debug capabilities.
> + *
> + */
> +static void kvm_arm_init_debug(CPUState *cs)
> +{
> +have_guest_debug = kvm_check_extension(cs->kvm_state,
> +   KVM_CAP_SET_GUEST_DEBUG);
> +return;
> +}
> +
>  static inline void set_feature(uint64_t *features, int feature)
>  {
>  *features |= 1ULL << feature;
> @@ -121,6 +137,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  }
>  cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
>
> +kvm_arm_init_debug(cs);
> +
>  return kvm_arm_init_cpreg_list(cpu);
>  }

I assume in practice the kernel guarantees that either all
CPUs have the SET_GUEST_DEBUG cap, or none do :-)

Reviewed-by: Peter Maydell 

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux-next: manual merge of the kvms390 tree with the kvm tree

2015-11-20 Thread Christian Borntraeger

On 11/20/2015 02:47 AM, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the kvms390 tree got a conflict in:
> 
>   include/linux/kvm_host.h
>   arch/s390/kvm/interrupt.c
>   arch/s390/kvm/sigp.c
> 
> between commits:
> 
>   db27a7a37aa0 ("KVM: Provide function for VCPU lookup by id")
>   b85de33a1a34 ("KVM: s390: avoid memory overwrites on emergency signal 
> injection")
>   152e9f65d66f ("KVM: s390: fix wrong lookup of VCPUs by array index")
> 
> from the kvm tree and commit:
> 
>   a8ab7a5fd13b ("KVM: Provide function for VCPU lookup by id")
>   e896d1d08efc ("KVM: s390: avoid memory overwrites on emergency signal 
> injection")
>   32d7bd98e799 ("KVM: s390: fix wrong lookup of VCPUs by array index")
> 
> from the kvms390 tree.
> 
> These patches used different names for the added function.  I used the
> name from the kvm tree (kvm_get_vcpu_by_id).

Yes, correct.


> This should probably all be cleaned up in the kvms390 tree.

Absolutely. Sorry for the hazzle. I had fixed that up but forgot to
push out. Should be fixed now.

Christian



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3 00/10] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-20 Thread Xiao Guangrong




On 11/20/2015 04:40 PM, Takuya Yoshikawa wrote:

It seems like you all are busy now, so I've made this patch set so that
mechanical and trivial changes come before.

V2->V3:
Patch 01: Rebased and moved here. Updated stale comments.
   We may also want to use a union, inside the struct, to eliminate casting to
   (u64 *) type when spte is in the head in the future.
Patch 02-05: No change.
   About patch 03: There was a comment on the usage of braces for a single line
   else-if statement from Xiao. As I answered, checkpatch did not complain about
   this, and when the corresponding if block has multiple lines, some developers
   prefer/recommend this style. Feel free to modify it if you don't like it.
Patch 06: Changed WARN_ON to BUG_ON as Marcelo suggested.
Patch 07: Removed unnecessary zero-initialization of sp->parent_ptes as Xiao
   suggested.



For patch 01 - patch 07:

Reviewed-by: Xiao Guangrong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-20 Thread Takuya Yoshikawa


On 2015/11/20 17:46, Xiao Guangrong wrote:


You just ignored my comment on the previous version...


I'm sorry but please read the explanation in patch 00.
I've read your comments and I'm not ignoring you.

Since this patch set has become huge than expected, I'm sending
this version so that patch 01-07 can be applied first.

For patch 08-10, I think we need to check more because there seems
to be some confusion between us.  You can also read other discussions
between Marcelo, Paolo and me.

Anyway, since these three patches has been placed at the end of the
series now, I hope we can concentrate on them easier than before.

Thanks,
  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-20 Thread Xiao Guangrong




You can move this patch to the front of
[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of 
pte_list_walk()

By moving kvm_mmu_mark_parents_unsync() to the behind of mmu_spte_set() (then 
the parent
spte is present now), you can directly clean up for_each_rmap_spte().


On 11/20/2015 04:48 PM, Takuya Yoshikawa wrote:

Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

Signed-off-by: Takuya Yoshikawa 
---
  arch/x86/kvm/mmu.c | 22 --
  arch/x86/kvm/paging_tmpl.h |  6 ++
  2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4e29d9a..b020323 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2107,14 +2107,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
} else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
}
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);

__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
@@ -2125,8 +2120,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,

sp = kvm_mmu_alloc_page(vcpu, direct);

-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-
sp->gfn = gfn;
sp->role = role;
hlist_add_head(&sp->hash_link,
@@ -2194,7 +2187,8 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
return __shadow_walk_next(iterator, *iterator->sptep);
  }

-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+struct kvm_mmu_page *sp)
  {
u64 spte;

@@ -2205,6 +2199,11 @@ static void link_shadow_page(u64 *sptep, struct 
kvm_mmu_page *sp)
   shadow_user_mask | shadow_x_mask | shadow_accessed_mask;

mmu_spte_set(sptep, spte);
+
+   if (sp->unsync_children || sp->unsync)
+   mark_unsync(sptep);
+
+   mmu_page_add_parent_pte(vcpu, sp, sptep);
  }

  static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2263,11 +2262,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
mmu_page_zap_pte(kvm, sp, sp->spt + i);
  }

-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-   mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
  {
u64 *sptep;
@@ -2733,7 +2727,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);

-   link_shadow_page(iterator.sptep, sp);
+   link_shadow_page(vcpu, iterator.sptep, sp);
}
}
return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 11650ea..0dcf9c8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
goto out_gpte_changed;

if (sp)
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}

for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
  true, direct_access, it.sptep);
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}

clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
return emulate;

  out_gpte_changed:
-   if (sp)
-   kvm_mmu_put_page(sp, it.sptep);
kvm_release_pfn_clean(pfn);
return 0;
  }


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-20 Thread Xiao Guangrong



You just ignored my comment on the previous version...

On 11/20/2015 04:47 PM, Takuya Yoshikawa wrote:

kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
nearly the same as the for_each_rmap_spte macro.  The only difference
is that is_shadow_present_pte() checks cannot be placed there because
kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
whose entry is not set yet.

By calling mark_unsync() separately for the parent and adding the parent
pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
works with no problem.

Signed-off-by: Takuya Yoshikawa 
---
  arch/x86/kvm/mmu.c | 36 +---
  1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7f46e3e..4e29d9a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct 
kvm_rmap_head *rmap_head)
}
  }

-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn)
-{
-   struct pte_list_desc *desc;
-   int i;
-
-   if (!rmap_head->val)
-   return;
-
-   if (!(rmap_head->val & 1))
-   return fn((u64 *)rmap_head->val);
-
-   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-   while (desc) {
-   for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-   fn(desc->sptes[i]);
-   desc = desc->more;
-   }
-}
-
  static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
   struct kvm_memory_slot *slot)
  {
@@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
  static void mark_unsync(u64 *spte);
  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
  {
-   pte_list_walk(&sp->parent_ptes, mark_unsync);
+   u64 *sptep;
+   struct rmap_iterator iter;
+
+   for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+   mark_unsync(sptep);
+   }
  }

  static void mark_unsync(u64 *spte)
@@ -2119,12 +2104,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
break;

-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   } else if (sp->unsync)
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   } else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   }
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);

__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

2015-11-20 Thread Takuya Yoshikawa

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 20 +++-
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b020323..9baf884 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 gva_t gaddr,
 unsigned level,
 int direct,
-unsigned access,
-u64 *parent_pte)
+unsigned access)
 {
union kvm_mmu_page_role role;
unsigned quadrant;
@@ -2724,8 +2723,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
- iterator.level - 1,
- 1, ACC_ALL, iterator.sptep);
+ iterator.level - 1, 1, ACC_ALL);
 
link_shadow_page(vcpu, iterator.sptep, sp);
}
@@ -3082,8 +3080,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
spin_lock(&vcpu->kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
- 1, ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
++sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3095,9 +3092,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
spin_lock(&vcpu->kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
- i << 30,
- PT32_ROOT_LEVEL, 1, ACC_ALL,
- NULL);
+   i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3134,7 +3129,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
spin_lock(&vcpu->kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
- 0, ACC_ALL, NULL);
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3167,9 +3162,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
spin_lock(&vcpu->kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, 0,
- ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0dcf9c8..91e939b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
- false, access, it.sptep);
+ false, access);
}
 
/*
@@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
- true, direct_access, it.sptep);
+ true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kerne

[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-20 Thread Takuya Yoshikawa

kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
nearly the same as the for_each_rmap_spte macro.  The only difference
is that is_shadow_present_pte() checks cannot be placed there because
kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
whose entry is not set yet.

By calling mark_unsync() separately for the parent and adding the parent
pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
works with no problem.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7f46e3e..4e29d9a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct 
kvm_rmap_head *rmap_head)
}
 }
 
-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn)
-{
-   struct pte_list_desc *desc;
-   int i;
-
-   if (!rmap_head->val)
-   return;
-
-   if (!(rmap_head->val & 1))
-   return fn((u64 *)rmap_head->val);
-
-   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-   while (desc) {
-   for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-   fn(desc->sptes[i]);
-   desc = desc->more;
-   }
-}
-
 static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
   struct kvm_memory_slot *slot)
 {
@@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-   pte_list_walk(&sp->parent_ptes, mark_unsync);
+   u64 *sptep;
+   struct rmap_iterator iter;
+
+   for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+   mark_unsync(sptep);
+   }
 }
 
 static void mark_unsync(u64 *spte)
@@ -2119,12 +2104,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
break;
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   } else if (sp->unsync)
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   } else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   }
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-20 Thread Takuya Yoshikawa

Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 22 --
 arch/x86/kvm/paging_tmpl.h |  6 ++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4e29d9a..b020323 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2107,14 +2107,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
} else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
}
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
@@ -2125,8 +2120,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 
sp = kvm_mmu_alloc_page(vcpu, direct);
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-
sp->gfn = gfn;
sp->role = role;
hlist_add_head(&sp->hash_link,
@@ -2194,7 +2187,8 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+struct kvm_mmu_page *sp)
 {
u64 spte;
 
@@ -2205,6 +2199,11 @@ static void link_shadow_page(u64 *sptep, struct 
kvm_mmu_page *sp)
   shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
 
mmu_spte_set(sptep, spte);
+
+   if (sp->unsync_children || sp->unsync)
+   mark_unsync(sptep);
+
+   mmu_page_add_parent_pte(vcpu, sp, sptep);
 }
 
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2263,11 +2262,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }
 
-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-   mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
u64 *sptep;
@@ -2733,7 +2727,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);
 
-   link_shadow_page(iterator.sptep, sp);
+   link_shadow_page(vcpu, iterator.sptep, sp);
}
}
return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 11650ea..0dcf9c8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
goto out_gpte_changed;
 
if (sp)
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}
 
for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
  true, direct_access, it.sptep);
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}
 
clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
return emulate;
 
 out_gpte_changed:
-   if (sp)
-   kvm_mmu_put_page(sp, it.sptep);
kvm_release_pfn_clean(pfn);
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-20 Thread Michael S. Tsirkin

On Fri, Nov 20, 2015 at 01:56:39PM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-11-19 at 23:38 +, David Woodhouse wrote:
> > 
> > I understand that POWER and other platforms don't currently have a
> > clean way to indicate that certain device don't have translation. And I
> > understand that we may end up with a *quirk* which ensures that the DMA
> > API does the right thing (i.e. nothing) in certain cases.
> > 
> > But we should *NOT* be involving the virtio device drivers in that
> > quirk, in any way. And putting a feature bit in the virtio device
> > itself doesn't seem at all sane either.
> > 
> > Bear in mind that qemu-system-x86_64 currently has the *same* problem
> > with assigned physical devices. It's claiming they're translated, and
> > they're not.
> 
> It's not that clear but yeah ... as I mentioned, I can't find a
> way to do that quirk that won't break when we want to actually use
> the iommu... 
> 
> Ben.

Yes, I am not at all sure we need a quirk for assigned devices.
Better teach QEMU to make iommu work for them.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()

2015-11-20 Thread Takuya Yoshikawa

Make kvm_mmu_alloc_page() do just what its name tells to do, and remove
the extra allocation error check and zero-initialization of parent_ptes:
shadow page headers allocated by kmem_cache_zalloc() are always in the
per-VCPU pools.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5b249d4..7f46e3e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1726,8 +1726,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-  u64 *parent_pte, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int 
direct)
 {
struct kvm_mmu_page *sp;
 
@@ -1743,8 +1742,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
 * this feature. See the comments in kvm_zap_obsolete_pages().
 */
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-   sp->parent_ptes.val = 0;
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
kvm_mod_used_mmu_pages(vcpu->kvm, +1);
return sp;
 }
@@ -2133,10 +2130,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
trace_kvm_mmu_get_page(sp, false);
return sp;
}
+
++vcpu->kvm->stat.mmu_cache_miss;
-   sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
-   if (!sp)
-   return sp;
+
+   sp = kvm_mmu_alloc_page(vcpu, direct);
+
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+
sp->gfn = gfn;
sp->role = role;
hlist_add_head(&sp->hash_link,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/10] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes

2015-11-20 Thread Takuya Yoshikawa

At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa 
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c| 26 +-
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
 page cannot be destroyed.  See role.invalid.
   parent_ptes:
 The reverse mapping for the pte/ptes pointing at this page's spt. If
-parent_ptes bit 0 is zero, only one spte points at this pages and
+parent_ptes bit 0 is zero, only one spte points at this page and
 parent_ptes points at this single spte, otherwise, there exists multiple
 sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-structure with a list of parent_ptes.
+structure with a list of parent sptes.
   unsync:
 If true, then the translations in this page may not match the guest's
 translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3104748..5b249d4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1098,17 +1098,23 @@ struct rmap_iterator {
 static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
   struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (!rmap_head->val)
return NULL;
 
if (!(rmap_head->val & 1)) {
iter->desc = NULL;
-   return (u64 *)rmap_head->val;
+   sptep = (u64 *)rmap_head->val;
+   goto out;
}
 
iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
iter->pos = 0;
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+out:
+   BUG_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 /*
@@ -1118,14 +1124,14 @@ static u64 *rmap_get_first(struct kvm_rmap_head 
*rmap_head,
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (iter->desc) {
if (iter->pos < PTE_LIST_EXT - 1) {
-   u64 *sptep;
-
++iter->pos;
sptep = iter->desc->sptes[iter->pos];
if (sptep)
-   return sptep;
+   goto out;
}
 
iter->desc = iter->desc->more;
@@ -1133,17 +1139,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
if (iter->desc) {
iter->pos = 0;
/* desc->sptes[0] cannot be NULL */
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+   goto out;
}
}
 
return NULL;
+out:
+   BUG_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)
\
for (_spte_ = rmap_get_first(_rmap_head_, _iter_);  \
-_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \
-_spte_ = rmap_get_next(_iter_))
+_spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1358,7 +1367,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct 
kvm_rmap_head *rmap_head)
bool flush = false;
 
while ((sptep = rmap_get_first(rmap_head, &iter))) {
-   BUG_ON(!(*sptep & PT_PRESENT_MASK));
rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
drop_spte(kvm, sptep);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()

2015-11-20 Thread Takuya Yoshikawa

is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them without clear distinction just makes the code
confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c   | 13 -
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 74c120c..3104748 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-   return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
u64 old_spte = *sptep;
bool ret = false;
 
-   WARN_ON(!is_rmap_spte(new_spte));
+   WARN_ON(!is_shadow_present_pte(new_spte));
 
if (!is_shadow_present_pte(old_spte)) {
mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
else
old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-   if (!is_rmap_spte(old_spte))
+   if (!is_shadow_present_pte(old_spte))
return 0;
 
pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep, unsigned pte_access,
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
 
-   if (is_rmap_spte(*sptep)) {
+   if (is_shadow_present_pte(*sptep)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t 
gva, int level,
 * If the mapping has been changed, let the vcpu fault on the
 * same address again.
 */
-   if (!is_rmap_spte(spte)) {
+   if (!is_shadow_present_pte(spte)) {
ret = true;
goto exit;
}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index f7b0488..1cee3ec 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
return;
 
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-   if (!is_rmap_spte(sp->spt[i]))
+   if (!is_shadow_present_pte(sp->spt[i]))
continue;
 
inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value

2015-11-20 Thread Takuya Yoshikawa

mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 27 ++-
 arch/x86/kvm/paging_tmpl.h | 10 +-
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9832bc9..74c120c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-unsigned pte_access, int write_fault, int *emulate,
-int level, gfn_t gfn, pfn_t pfn, bool speculative,
-bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned 
pte_access,
+int write_fault, int level, gfn_t gfn, pfn_t pfn,
+bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;
+   bool emulate = false;
 
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
  true, host_writable)) {
if (write_fault)
-   *emulate = 1;
+   emulate = true;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
 
-   if (unlikely(is_mmio_spte(*sptep) && emulate))
-   *emulate = 1;
+   if (unlikely(is_mmio_spte(*sptep)))
+   emulate = true;
 
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
 
kvm_release_pfn_clean(pfn);
+
+   return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
 
for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, access, 0, NULL,
-sp->role.level, gfn, page_to_pfn(pages[i]),
-true, true);
+   mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+page_to_pfn(pages[i]), true, true);
 
return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
 
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
-   mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-write, &emulate, level, gfn, pfn,
-prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+  write, level, gfn, pfn, prefault,
+  map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d8fdc5c..11650ea 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 * we call mmu_set_spte() with host_writable = true because
 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 */
-   mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-gfn, pfn, true, true);
+   mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+true, true);
 
return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
-   int top_level, emulate = 0;
+   int top_level, emulate;
 
direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
 
clear_sp_write_flooding_count(it.sptep);
-   mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, &emulate,
-it.level, gw->gfn, pfn, prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, it.sptep, gw->pte_access

[PATCH 03/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-20 Thread Takuya Yoshikawa

Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8a1593f..9832bc9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1809,6 +1809,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, 
struct kvm_mmu_page *sp,
return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+   --sp->unsync_children;
+   WARN_ON((int)sp->unsync_children < 0);
+   __clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
   struct kvm_mmu_pages *pvec)
 {
@@ -1818,8 +1825,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];
 
-   if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-   goto clear_child_bitmap;
+   if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   }
 
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1828,28 +1837,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return -ENOSPC;
 
ret = __mmu_unsync_walk(child, pvec);
-   if (!ret)
-   goto clear_child_bitmap;
-   else if (ret > 0)
+   if (!ret) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   } else if (ret > 0) {
nr_unsync_leaf += ret;
-   else
+   } else
return ret;
} else if (child->unsync) {
nr_unsync_leaf++;
if (mmu_pages_add(pvec, child, i))
return -ENOSPC;
} else
-goto clear_child_bitmap;
-
-   continue;
-
-clear_child_bitmap:
-   __clear_bit(i, sp->unsync_child_bitmap);
-   sp->unsync_children--;
-   WARN_ON((int)sp->unsync_children < 0);
+   clear_unsync_child_bit(sp, i);
}
 
-
return nr_unsync_leaf;
 }
 
@@ -2012,9 +2014,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path 
*parents)
if (!sp)
return;
 
-   --sp->unsync_children;
-   WARN_ON((int)sp->unsync_children < 0);
-   __clear_bit(idx, sp->unsync_child_bitmap);
+   clear_unsync_child_bit(sp, idx);
level++;
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/10] KVM: x86: MMU: Remove unused parameter of __direct_map()

2015-11-20 Thread Takuya Yoshikawa

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9a6801..8a1593f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int map_writable, int level, gfn_t gfn, pfn_t pfn,
-   bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+   int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
-   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);
 
-
return r;
 
 out_unlock:
@@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
-   r = __direct_map(vcpu, gpa, write, map_writable,
-level, gfn, pfn, prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);
 
return r;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct

2015-11-20 Thread Takuya Yoshikawa

New struct kvm_rmap_head makes the code type-safe to some extent.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/include/asm/kvm_host.h |   8 +-
 arch/x86/kvm/mmu.c  | 196 
 arch/x86/kvm/mmu_audit.c|  13 +--
 3 files changed, 113 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f608e17..8140077 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -214,6 +214,10 @@ union kvm_mmu_page_role {
};
 };
 
+struct kvm_rmap_head {
+   unsigned long val;
+};
+
 struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
@@ -231,7 +235,7 @@ struct kvm_mmu_page {
bool unsync;
int root_count;  /* Currently serving as active root */
unsigned int unsync_children;
-   unsigned long parent_ptes;  /* Reverse mapping for parent_pte */
+   struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 
/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen.  */
unsigned long mmu_valid_gen;
@@ -606,7 +610,7 @@ struct kvm_lpage_info {
 };
 
 struct kvm_arch_memory_slot {
-   unsigned long *rmap[KVM_NR_PAGE_SIZES];
+   struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 276d2f2..d9a6801 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -909,36 +909,35 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn,
 }
 
 /*
- * Pte mapping structures:
+ * About rmap_head encoding:
  *
- * If pte_list bit zero is zero, then pte_list point to the spte.
- *
- * If pte_list bit zero is one, (then pte_list & ~1) points to a struct
+ * If the bit zero of rmap_head->val is clear, then it points to the only spte
+ * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
  * pte_list_desc containing more mappings.
- *
- * Returns the number of pte entries before the spte was added or zero if
- * the spte was not added.
- *
+ */
+
+/*
+ * Returns the number of pointers in the rmap chain, not counting the new one.
  */
 static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
-   unsigned long *pte_list)
+   struct kvm_rmap_head *rmap_head)
 {
struct pte_list_desc *desc;
int i, count = 0;
 
-   if (!*pte_list) {
+   if (!rmap_head->val) {
rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
-   *pte_list = (unsigned long)spte;
-   } else if (!(*pte_list & 1)) {
+   rmap_head->val = (unsigned long)spte;
+   } else if (!(rmap_head->val & 1)) {
rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
desc = mmu_alloc_pte_list_desc(vcpu);
-   desc->sptes[0] = (u64 *)*pte_list;
+   desc->sptes[0] = (u64 *)rmap_head->val;
desc->sptes[1] = spte;
-   *pte_list = (unsigned long)desc | 1;
+   rmap_head->val = (unsigned long)desc | 1;
++count;
} else {
rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
-   desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
desc = desc->more;
count += PTE_LIST_EXT;
@@ -955,8 +954,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 }
 
 static void
-pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
-  int i, struct pte_list_desc *prev_desc)
+pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+  struct pte_list_desc *desc, int i,
+  struct pte_list_desc *prev_desc)
 {
int j;
 
@@ -967,43 +967,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, 
struct pte_list_desc *desc,
if (j != 0)
return;
if (!prev_desc && !desc->more)
-   *pte_list = (unsigned long)desc->sptes[0];
+   rmap_head->val = (unsigned long)desc->sptes[0];
else
if (prev_desc)
prev_desc->more = desc->more;
else
-   *pte_list = (unsigned long)desc->more | 1;
+   rmap_head->val = (unsigned long)desc->more | 1;
mmu_free_pte_list_desc(desc);
 }
 
-static void pte_list_remove(u64 *spte, unsigned long *pte_list)
+static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 {
struct pte_list_desc *desc;
struct pte_list_desc *prev_desc;
int i;
 
-   if (!*pte_list) {
+   if (!rmap_head->val) {
printk(KER

[PATCH V3 00/10] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-20 Thread Takuya Yoshikawa

It seems like you all are busy now, so I've made this patch set so that
mechanical and trivial changes come before.

V2->V3:
Patch 01: Rebased and moved here. Updated stale comments.
  We may also want to use a union, inside the struct, to eliminate casting to
  (u64 *) type when spte is in the head in the future.
Patch 02-05: No change.
  About patch 03: There was a comment on the usage of braces for a single line
  else-if statement from Xiao. As I answered, checkpatch did not complain about
  this, and when the corresponding if block has multiple lines, some developers
  prefer/recommend this style. Feel free to modify it if you don't like it.
Patch 06: Changed WARN_ON to BUG_ON as Marcelo suggested.
Patch 07: Removed unnecessary zero-initialization of sp->parent_ptes as Xiao
  suggested.

I think these seven patches are ready for inclusion.

Patch 08-10: No change now, though there were a few comments.
  This patch set is not intended to optimize anything, so these patches try to
  keep the way mark_unsync() gets called as much as possible: the only changes
  are when this gets called for the new parent_pte and when
  mmu_page_add_parent_pte() gets called.

For these three, I'm not sure what we should do now, still RFC?
We can also consider other approaches, e.g. moving link_shadow_page() in the
kvm_get_mmu_page() as Paolo suggested before.

  Takuya

Takuya Yoshikawa (10):
  [01] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  [02] KVM: x86: MMU: Remove unused parameter of __direct_map()
  [03] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  [04] KVM: x86: MMU: Make mmu_set_spte() return emulate value
  [05] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  [06] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes
  [07] KVM: x86: MMU: Move initialization of parent_ptes out from 
kvm_mmu_alloc_page()
  [08] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  [09] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to 
link_shadow_page()
  [10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/include/asm/kvm_host.h   |   8 +-
 arch/x86/kvm/mmu.c| 370 ++
 arch/x86/kvm/mmu_audit.c  |  15 +-
 arch/x86/kvm/paging_tmpl.h|  20 +--
 5 files changed, 201 insertions(+), 216 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-20 Thread Michael S. Tsirkin

On Thu, Nov 19, 2015 at 11:38:06PM +, David Woodhouse wrote:
> On Thu, 2015-11-19 at 13:59 -0800, Andy Lutomirski wrote:
> > 
> > >
> > > So thinking hard about it, I don't see any real drawbacks to making this
> > > conditional on a new feature bit, that Xen can then set..
> > 
> > Can you elaborate?  If I run QEMU, hosting Xen, hosting Linux, and the
> > virtio device is provided by QEMU, then how does Xen set the bit?
> > Similarly, how would Xen set the bit for a real physical device?
> 
> Right. This is *not* a fundamental characteristic of the device. This
> is all about how your *particular* hypervisor (in the set of turtles-
> all-the-way-down) happened to expose the thing to you.
> 
> This is why it lives in the DMAR table, in the Intel world, which
> *tells* you which devices are behind which IOMMU (and which are not).

David, there are two things a hypervisor needs to tell the guest.
1. The actual device is behind an IOMMU. This is what you
   are suggesting we use DMAR for.
2. Using IOMMU from kernel (as opposed to from userspace with VFIO)
   actually adds security. For exising virtio devices on KVM,
   the answer is no. And DMAR has no way to reflect that.

Question 2 only makes sense if you answer yes to question 1 and if user
wants protection from malicious devices with iommu=on, and
if you care about getting good performance from *other*
devices.  And what guest would do is use 1:1 for the
devices where answer 2 is "no".

Maybe for now I should just give up and say "don't use iommu=on within
VMs if you want any performance".  But the point is, if we just fix QEMU
to actually obey IOMMU mappings for assigned devices, then there's
already a kind of answer with virtio being trusted since it's part of
hypervisor, all this without guest changes. Seems kind of sad to let
performance regress.

So a (yet another) feature bit would be a possible solution there, but
we don't seem to be able to even agree on using a feature bit for a
quirk.

> And why I keep repeating myself that it has nothing to do with the
> actual device or the virtio drivers.
>
> I understand that POWER and other platforms don't currently have a
> clean way to indicate that certain device don't have translation. And I
> understand that we may end up with a *quirk* which ensures that the DMA
> API does the right thing (i.e. nothing) in certain cases.

So assuming we forget about 2 above for now, then yes, all we need
is a quirk, using some logic to detect these systems.

> But we should *NOT* be involving the virtio device drivers in that
> quirk, in any way. And putting a feature bit in the virtio device
> itself doesn't seem at all sane either.

Only if there's some other device that benefits from all this work.  If
virtio is the only one that benefits, then why do we want to
spread the quirk rules around so much? A feature bit gives us
a single, portable rule that the quirk can use on all platforms.

> Bear in mind that qemu-system-x86_64 currently has the *same* problem
> with assigned physical devices. It's claiming they're translated, and
> they're not.
> 
> -- 
> dwmw2
> 

Presumably people either don't assign
devices or don't have an iommu otherwise things won't work for them,
but if they do have an iommu and don't assign devices, then Andy's
patch will break them.

This is not QEMU specific unfortunately, we don't know who
might have implemented virtio.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-11-20 Thread Thomas Huth

In the old DABR register, the BT (Breakpoint Translation) bit
is bit number 61. In the new DAWRX register, the WT (Watchpoint
Translation) bit is bit number 59. So to move the DABR-BT bit
into the position of the DAWRX-WT bit, it has to be shifted by
two, not only by one. This fixes hardware watchpoints in gdb of
older guests that only use the H_SET_DABR/X interface instead
of the new H_SET_MODE interface.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..3983b87 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
 2: rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
-   rlwimi  r5, r4, 1, DAWRX_WT
+   rlwimi  r5, r4, 2, DAWRX_WT
clrrdi  r4, r4, 3
std r4, VCPU_DAWR(r3)
std r5, VCPU_DAWRX(r3)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

40 matches

Mail list logo