from:"Christoffer Dall"

Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop

2019-03-15 Thread Christoffer Dall

Hi Steve,

On Wed, Mar 13, 2019 at 10:11:30AM +, Steven Price wrote:
> 
> Personally I think what we need is:
> 
> * Either a patch like the one from Heyi Guo (save/restore CNTVCT_EL0) or
> alternatively hooking up KVM_KVMCLOCK_CTRL to prevent the watchdog
> firing when user space explicitly stops scheduling the guest for a while.

If we save/restore CNTVCT_EL0 and the warning goes away, does the guest
wall clock timekeeping get all confused and does it figure this out
automagically somehow?

Does KVM_KVMCLOCK_CTRL solve that problem?

> 
> * KVM itself saving/restoring CNTVCT_EL0 during suspend/resume so the
> guest doesn't see time pass during a suspend.

This smells like policy to me so I'd much prefer keeping as much
functionality in user space as possible.  If we already have the APIs we
need from KVM, let's use them.

> 
> * Something equivalent to MSR_KVM_WALL_CLOCK_NEW for arm which allows
> the guest to query the wall clock time from the host and provides an
> offset between CNTVCT_EL0 to wall clock time which the KVM can update
> during suspend/resume. This means that during a suspend/resume the guest
> can observe that wall clock time has passed, without having to be
> bothered about CNTVCT_EL0 jumping forwards.
> 

Isn't the proper Arm architectural solution for this to read the
physical counter for wall clock time keeping ?

(Yes that will require a trap on physical counter reads after migration
on current systems, but migration sucks in terms of timekeeping
already.)

Thanks,

Christoffer

Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop

2019-03-12 Thread Christoffer Dall

[Adding Steven Price, who has recently looked at this, in cc]

On Tue, Mar 12, 2019 at 10:08:47AM +, Peter Maydell wrote:
> On Tue, 12 Mar 2019 at 06:10, Heyi Guo  wrote:
> >
> > When we stop a VM for more than 30 seconds and then resume it, by qemu
> > monitor command "stop" and "cont", Linux on VM will complain of "soft
> > lockup - CPU#x stuck for xxs!" as below:
> >
> > [ 2783.809517] watchdog: BUG: soft lockup - CPU#3 stuck for 2395s!
> > [ 2783.809559] watchdog: BUG: soft lockup - CPU#2 stuck for 2395s!
> > [ 2783.809561] watchdog: BUG: soft lockup - CPU#1 stuck for 2395s!
> > [ 2783.809563] Modules linked in...
> >
> > This is because Guest Linux uses generic timer virtual counter as
> > a software watchdog, and CNTVCT_EL0 does not stop when VM is stopped
> > by qemu.
> >
> > This patch is to fix this issue by saving the value of CNTVCT_EL0 when
> > stopping and restoring it when resuming.
> 
> Hi -- I know we have issues with the passage of time in Arm VMs
> running under KVM when the VM is suspended, but the topic is
> a tricky one, and it's not clear to me that this is the correct
> way to fix it. I would prefer to see us start with a discussion
> on the kvm-arm mailing list about the best approach to the problem.
> 
> I've cc'd that list and a couple of the Arm KVM maintainers
> for their opinion.
> 
> QEMU patch left below for context -- the brief summary is that
> it uses KVM_GET_ONE_REG/KVM_SET_ONE_REG on the timer CNT register
> to save it on VM pause and write that value back on VM resume.
> 
> thanks
> -- PMM
> 
> 
> 
> > Cc: Peter Maydell 
> > Signed-off-by: Heyi Guo 
> > ---
> >  target/arm/cpu.c | 65 
> > 
> >  1 file changed, 65 insertions(+)
> >
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 96f0ff0..7bbba3d 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -896,6 +896,60 @@ static void arm_cpu_finalizefn(Object *obj)
> >  #endif
> >  }
> >
> > +static int get_vcpu_timer_tick(CPUState *cs, uint64_t *tick_at_pause)
> > +{
> > +int err;
> > +struct kvm_one_reg reg;
> > +
> > +reg.id = KVM_REG_ARM_TIMER_CNT;
> > +reg.addr = (uintptr_t) tick_at_pause;
> > +
> > +err = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
> > +return err;
> > +}
> > +
> > +static int set_vcpu_timer_tick(CPUState *cs, uint64_t tick_at_pause)
> > +{
> > +int err;
> > +struct kvm_one_reg reg;
> > +
> > +reg.id = KVM_REG_ARM_TIMER_CNT;
> > +reg.addr = (uintptr_t) _at_pause;
> > +
> > +err = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
> > +return err;
> > +}
> > +
> > +static void arch_timer_change_state_handler(void *opaque, int running,
> > +RunState state)
> > +{
> > +static uint64_t hw_ticks_at_paused;
> > +static RunState pre_state = RUN_STATE__MAX;
> > +int err;
> > +CPUState *cs = (CPUState *)opaque;
> > +
> > +switch (state) {
> > +case RUN_STATE_PAUSED:
> > +err = get_vcpu_timer_tick(cs, _ticks_at_paused);
> > +if (err) {
> > +error_report("Get vcpu timer tick failed: %d", err);
> > +}
> > +break;
> > +case RUN_STATE_RUNNING:
> > +if (pre_state == RUN_STATE_PAUSED) {
> > +err = set_vcpu_timer_tick(cs, hw_ticks_at_paused);
> > +if (err) {
> > +error_report("Resume vcpu timer tick failed: %d", err);
> > +}
> > +}
> > +break;
> > +default:
> > +break;
> > +}
> > +
> > +pre_state = state;
> > +}
> > +
> >  static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
> >  {
> >  CPUState *cs = CPU(dev);
> > @@ -906,6 +960,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> > **errp)
> >  Error *local_err = NULL;
> >  bool no_aa32 = false;
> >
> > +/*
> > + * Only add change state handler for arch timer once, for KVM will 
> > help to
> > + * synchronize virtual timer of all VCPUs.
> > + */
> > +static bool arch_timer_change_state_handler_added;
> > +
> >  /* If we needed to query the host kernel for the CPU features
> >   * then it's possible that might have failed in the initfn, but
> >   * this is the first point where we can report it.
> > @@ -1181,6 +1241,11 @@ static void arm_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> >
> >  init_cpreg_list(cpu);
> >
> > +if (!arch_timer_change_state_handler_added && kvm_enabled()) {
> > +qemu_add_vm_change_state_handler(arch_timer_change_state_handler, 
> > cs);
> > +arch_timer_change_state_handler_added = true;
> > +}
> > +
> >  #ifndef CONFIG_USER_ONLY
> >  if (cpu->has_el3 || arm_feature(env, ARM_FEATURE_M_SECURITY)) {
> >  cs->num_ases = 2;
> > --
> > 1.8.3.1

Re: [Qemu-devel] [RFC] [PATCH] kvm: arm: Introduce error code KVM_EINVARIANT

2018-11-12 Thread Christoffer Dall

Hi Manish,

On Sat, Nov 10, 2018 at 10:18:47PM +, Manish Jaggi wrote:
> 
> CCing a larger audience.
> Please review.
> 
> On 10/23/2018 03:51 PM, Jaggi, Manish wrote:
> > From: Manish Jaggi 
> >
> > This patch introduces an error code KVM_EINVARIANT which is returned
> > by KVM when userland tries to set an invariant register.
> >
> > The need for this error code is in VM Migration for arm64.
> > ARM64 systems use mainly -machine virt -cpu host as parameter to qemu.
> > Migration requires both Source and destination machines to have same
> > physical cpu. There are cases where the overall architecture of CPU is
> > same but the next version of the chip with some bug fixes which have no
> > effect on qemu operation. In such cases invariant registers like MIDR
> > have a different value.
> > Currently Migration fails in such cases.
> >
> > Rather than sending a EINVAL, a specifc error code will help
> > userland program the guest invariant register by querying the migrated
> > host machines invariant registers.
> >
> > Qemu will have a parameter -hostinvariant along with checking of this
> > error code. So it can be safely assumed that the feature is opt-in
> >
> > Corresponding Qemu patchset can be found at :
> > https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg05048.html
> >
> >
> > Signed-off-by: Manish Jaggi 
> > ---
> >   arch/arm64/kvm/sys_regs.c | 9 -
> >   include/uapi/linux/kvm_para.h | 1 +
> >   2 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 22fbbdb..78ffc02 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -,7 +,7 @@ static int __set_id_reg(const struct sys_reg_desc 
> > *rd, void __user *uaddr,
> >   
> > /* This is what we mean by invariant: you can't change it. */
> > if (val != read_id_reg(rd, raz))
> > -   return -EINVAL;
> > +   return -KVM_EINVARIANT;
> >   
> > return 0;
> >   }
> > @@ -2254,9 +2254,8 @@ static int set_invariant_sys_reg(u64 id, void __user 
> > *uaddr)
> > return err;
> >   
> > /* This is what we mean by invariant: you can't change it. */
> > -   if (r->val != val)
> > -   return -EINVAL;
> > -
> > +   if (r->val != val)  
> > +   return -KVM_EINVARIANT;
> > return 0;
> >   }
> >   
> > @@ -2335,7 +2334,7 @@ static int demux_c15_set(u64 id, void __user *uaddr)
> >   
> > /* This is also invariant: you can't change it. */
> > if (newval != get_ccsidr(val))
> > -   return -EINVAL;
> > +   return -KVM_EINVARIANT;
> > return 0;
> > default:
> > return -ENOENT;
> > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > index 6c0ce49..4358669 100644
> > --- a/include/uapi/linux/kvm_para.h
> > +++ b/include/uapi/linux/kvm_para.h
> > @@ -17,6 +17,7 @@
> >   #define KVM_E2BIG E2BIG
> >   #define KVM_EPERM EPERM
> >   #define KVM_EOPNOTSUPP95
> > +#define KVM_EINVARIANT  96
> >   
> >   #define KVM_HC_VAPIC_POLL_IRQ 1
> >   #define KVM_HC_MMU_OP 2
> 

Eh, unless I'm severely missing something, I think you're confusing how
these error codes are used.

The error codes in include/uapi/linux/kvm_para.h are for hypercalls from
guests, not return values to userspace.  For the latter, we don't
have a habit of inventing new constants for something subsystem
specific, but instead we use the existing error codes which can be
understood by userspace software.

What's more, changing an error code is an ABI change and not something
you can just do at will.  If we are overloading EINVAL with the
KVM_SET_ONE_REG syscall (are we?), then you need to find some way to
communicate the additional information you need to userspace without
interfering with legacy userspace software's interaction with the
kernel.


Thanks,

Christoffer

Re: [Qemu-devel] [PATCH v2 0/5] target/arm: KVM vs ARMISARegisters

2018-11-02 Thread Christoffer Dall

On Fri, Nov 02, 2018 at 04:36:35PM +, Peter Maydell wrote:
> On 2 November 2018 at 14:54, Richard Henderson
>  wrote:
> > My previous patch set for replacing feature bits with id registers
> > failed to consider that these id registers are beginning to control
> > migration, and thus we must fill them in for KVM as well.
> >
> > Thus, we want to initialize these values within CPU from the host.
> >
> > Finally, re-send the T32EE conversion patch, fixing the build
> > failure on an arm32 host in kvm32.c.
> >
> > Changes, v1->v2:
> >   * Remove assert that AArch32 sysreg <= UINT32_MAX.
> >   * Remove unused local variable.
> >   * Add commentary for AArch32 sysregs vs missing AArch32 support.
> 
> As noted on IRC, on my admittedly pretty ancient 4.8.0 kernel some
> of these ID register reads via KVM_GET_ONE_REG fail ENOENT.
> strace says:
> 
> openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 18
> ioctl(18, KVM_CREATE_VM or LOGGER_GET_LOG_BUF_SIZE, 0) = 19
> ioctl(19, KVM_CREATE_VCPU, 0)   = 20
> ioctl(19, KVM_ARM_PREFERRED_TARGET, 0xcfeb4e88) = 0
> ioctl(20, KVM_ARM_VCPU_INIT, 0xcfeb4e88) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28)
> = -1 ENOENT (No such file or directory)
> 
> 
> I added a bit of extra tracing, since strace doesn't
> print the ID field for the ioctl:
> 
> peter.maydell@mustang-maydell:~/qemu$
> ~/test-images/virtv8-for-nesting/runme-kvm
> ./build/for-kvm/aarch64-softmmu/qemu-system-aarch64 -enable-kvm -cpu
> max -machine gic-version=max
> read_sys_reg64: reading ID 0x60300013c030...-1
> read_sys_reg64: reading ID 0x60300013c031...-1
> read_sys_reg64: reading ID 0x60300013c020...-1
> read_sys_reg64: reading ID 0x60300013c021...-1
> read_sys_reg32: reading ID 0x60300013c010...0
> read_sys_reg32: reading ID 0x60300013c011...0
> read_sys_reg32: reading ID 0x60300013c012...0
> read_sys_reg32: reading ID 0x60300013c013...0
> read_sys_reg32: reading ID 0x60300013c014...0
> read_sys_reg32: reading ID 0x60300013c015...0
> read_sys_reg32: reading ID 0x60300013c017...-1
> read_sys_reg32: reading ID 0x60300013c018...-1
> read_sys_reg32: reading ID 0x60300013c019...-1
> read_sys_reg32: reading ID 0x60300013c01a...-1
> qemu-system-aarch64: Failed to retrieve host CPU features
> 
> It looks like the kernel can handle reads of ID_ISAR0_EL1
> through ID_ISAR5_EL1, but not ID_ISAR6_EL1, any of the
> MVFR*_EL1 or ID_AA64_ISAR* or ID_AA64PFR*.
> 
> This is probably because the kernel is way too old to be
> interestingly supportable for KVM, but we did previously
> manage to boot on this setup.

I'm a little confused. v4.8 used to work (although it was perhaps not
the most stable at that time).  What changed?  Is this attempting to
restore a VM from a newer kernel, or has QEMU been updated to detect
this?

> 
> We should probably at least figure out which version of
> the kernel fixed this bug and made the ID registers available
> to userspace... if it's sufficiently ancient we could
> likely say "not supported", but if it's more recent we
> need a workaround somehow. I have cc'd a couple of kernel
> folks who might be able to help with the "which version"
> question.
> 

It appears the support for exposing a bunch of ID registers was
introduced with:

93390c0a1b20 (arm64: KVM: Hide unsupported AArch64 CPU features from guests, 
2017-10-31)

Which Dave (cc'ed) wrote and which was introduced in v4.15.

As per my question above, I'm not exactly sure what (if anything) we
need to fix on the kernel side?


Thanks,

Christoffer

Re: [Qemu-devel] [PATCH] target/arm/kvm: gic: Prevent creating userspace GICv3 with KVM

2018-02-06 Thread Christoffer Dall

Hi Philippe,

On Mon, Feb 05, 2018 at 09:10:48PM -0300, Philippe Mathieu-Daudé wrote:
> 
> On 02/01/2018 05:53 PM, Christoffer Dall wrote:
> > KVM doesn't support emulating a GICv3 in userspace, only GICv2.  We
> > currently attempt this anyway, and as a result a KVM guest doesn't
> > receive interrupts and the user is left wondering why.  Report an error
> > to the user if this particular combination is requested.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
> > ---
> >  target/arm/kvm_arm.h | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> > index ff53e9fafb..cfb7e5af72 100644
> > --- a/target/arm/kvm_arm.h
> > +++ b/target/arm/kvm_arm.h
> > @@ -234,6 +234,10 @@ static inline const char *gicv3_class_name(void)
> 
> Any reason why these functions are inlined in this include?

Not really sure what you mean?  What do you think would be a better
approach?

> 
> Also - not related to your patch - while gicv3_class_name() is
> self-explicit, gic_class_name() isn't (to me at least).

I think I tend to see 'gic' as GICv2 and 'gicv3' as GICv3 in QEMU, but I
don't remember the details of how you make the GIC be a proper GICv2 vs.
the 11mpcore version thingy.

> 
> There are many check for gic_version 1/2/3 in virt.c, which we could
> clean using more generic functions such gic_class_name(int gic_version)
> and Co.

Possibly.  Do note, that checking "which version of the GIC does this
board contain" is a separate thing from "which particular mechanism do I
use to emulate the GIC".

> 
> >  exit(1);
> >  #endif
> >  } else {
> > +if (kvm_enabled()) {
> > +error_report("Userspace GICv3 is not supported with KVM");
> > +exit(1);
> 
> Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org>
> 
> > +}
> >  return "arm-gicv3";
> >  }
> >  }
> > 
> 
Thanks!
-Christoffer


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] target/arm/kvm: gic: Prevent creating userspace GICv3 with KVM

2018-02-01 Thread Christoffer Dall

KVM doesn't support emulating a GICv3 in userspace, only GICv2.  We
currently attempt this anyway, and as a result a KVM guest doesn't
receive interrupts and the user is left wondering why.  Report an error
to the user if this particular combination is requested.

Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
---
 target/arm/kvm_arm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index ff53e9fafb..cfb7e5af72 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -234,6 +234,10 @@ static inline const char *gicv3_class_name(void)
 exit(1);
 #endif
 } else {
+if (kvm_enabled()) {
+error_report("Userspace GICv3 is not supported with KVM");
+exit(1);
+}
 return "arm-gicv3";
 }
 }
-- 
2.14.2

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-02-01 Thread Christoffer Dall

On Thu, Feb 01, 2018 at 01:25:20PM +0100, Andrew Jones wrote:
> On Thu, Feb 01, 2018 at 11:48:31AM +0100, Christoffer Dall wrote:
> > On Thu, Feb 01, 2018 at 11:42:22AM +0100, Andrew Jones wrote:
> > > On Thu, Feb 01, 2018 at 10:59:54AM +0100, Christoffer Dall wrote:
> > > > On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel
> > > > > If you could use dirty page tracking to only perform the cache
> > > > > invalidation when the framebuffer memory has been updated, you can at
> > > > > least limit the impact to cases where the framebuffer is actually
> > > > > used, rather than sitting idle with a nice wallpaper image.
> > > 
> > > Yes, this is the exact approach I took back when I experimented with
> > > this. I must have screwed up my PoC in some way (like using the gcc
> > > builtin), because it wasn't working for me...
> > > 
> > Does that mean you have some code you feel like reviving and use to send
> > out an RFC based on? ;)
> >
> 
> I don't usually delete things, but I do do some sort of copy on use
> from old homedirs to new ones from time to time, letting stuff that
> gets really old completely drop at some point. I might be able to
> find the old work, or just hack a quick and ugly version from my
> memory to share - hopefully whatever I did to cause it to fail last
> time has been forgotten :-)
> 

If you have bandwidth to look at this that's great.  Otherwise let us
know, and I'll see if I can devote some time to it.

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-02-01 Thread Christoffer Dall

On Thu, Feb 01, 2018 at 11:42:22AM +0100, Andrew Jones wrote:
> On Thu, Feb 01, 2018 at 10:59:54AM +0100, Christoffer Dall wrote:
> > On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel
> > <ard.biesheu...@linaro.org> wrote:
> > > On 1 February 2018 at 09:17, Christoffer Dall
> > > <christoffer.d...@linaro.org> wrote:
> > >> On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel
> > >> <ard.biesheu...@linaro.org> wrote:
> > >>> On 31 January 2018 at 19:12, Christoffer Dall
> > >>> <christoffer.d...@linaro.org> wrote:
> > >>>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
> > >>>> <ard.biesheu...@linaro.org> wrote:
> > >>>>> On 31 January 2018 at 17:39, Christoffer Dall
> > >>>>> <christoffer.d...@linaro.org> wrote:
> > >>>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
> > >>>>>> <ard.biesheu...@linaro.org> wrote:
> > >>>>>>> On 31 January 2018 at 16:53, Christoffer Dall
> > >>>>>>> <christoffer.d...@linaro.org> wrote:
> > >>>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
> > >>>>>>>> <ard.biesheu...@linaro.org> wrote:
> > >>>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall
> > >>>>>>>>> <christoffer.d...@linaro.org> wrote:
> > >>>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
> > >>>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
> > >>>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
> > >>>>>>>>>>> > <dgilb...@redhat.com> wrote:
> > >>>>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > >>>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
> > >>>>>>>>>>> >>> <dgilb...@redhat.com> wrote:
> > >>>>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > >>>>>>>>>>> >>>>> I think the correct fix here is that your test code 
> > >>>>>>>>>>> >>>>> should turn
> > >>>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable 
> > >>>>>>>>>>> >>>>> doesn't work
> > >>>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device 
> > >>>>>>>>>>> >>>>> video memory
> > >>>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to 
> > >>>>>>>>>>> >>>>> map it as
> > >>>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
> > >>>>>>>>>>> >>>>
> > >>>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong 
> > >>>>>>>>>>> >>>> point during
> > >>>>>>>>>>> >>>> a VM boot?
> > >>>>>>>>>>> >>>
> > >>>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've 
> > >>>>>>>>>>> >>> ever
> > >>>>>>>>>>> >>> tried to provoke that problem...
> > >>>>>>>>>>> >>
> > >>>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be 
> > >>>>>>>>>>> >> best to fail
> > >>>>>>>>>>> >> the migration if you can detect the cache is disabled in the 
> > >>>>>>>>>>> >> guest.
> > >>>>>>>>>>> >
> > >>>>>>>>>>> > I guess QEMU could look at the value of the "MMU 
> > >>>>>>>>>>> > disabled/enabled" bit
> > >>>>>>>>&

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-02-01 Thread Christoffer Dall

On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 1 February 2018 at 09:17, Christoffer Dall
> <christoffer.d...@linaro.org> wrote:
>> On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel
>> <ard.biesheu...@linaro.org> wrote:
>>> On 31 January 2018 at 19:12, Christoffer Dall
>>> <christoffer.d...@linaro.org> wrote:
>>>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
>>>> <ard.biesheu...@linaro.org> wrote:
>>>>> On 31 January 2018 at 17:39, Christoffer Dall
>>>>> <christoffer.d...@linaro.org> wrote:
>>>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
>>>>>> <ard.biesheu...@linaro.org> wrote:
>>>>>>> On 31 January 2018 at 16:53, Christoffer Dall
>>>>>>> <christoffer.d...@linaro.org> wrote:
>>>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>>>>>>>> <ard.biesheu...@linaro.org> wrote:
>>>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall
>>>>>>>>> <christoffer.d...@linaro.org> wrote:
>>>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
>>>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
>>>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
>>>>>>>>>>> > <dgilb...@redhat.com> wrote:
>>>>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>>>>>>>>>> >>> <dgilb...@redhat.com> wrote:
>>>>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>>>>>>>> >>>>> I think the correct fix here is that your test code should 
>>>>>>>>>>> >>>>> turn
>>>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't 
>>>>>>>>>>> >>>>> work
>>>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video 
>>>>>>>>>>> >>>>> memory
>>>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map 
>>>>>>>>>>> >>>>> it as
>>>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong 
>>>>>>>>>>> >>>> point during
>>>>>>>>>>> >>>> a VM boot?
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>>>>>>>>>> >>> tried to provoke that problem...
>>>>>>>>>>> >>
>>>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best 
>>>>>>>>>>> >> to fail
>>>>>>>>>>> >> the migration if you can detect the cache is disabled in the 
>>>>>>>>>>> >> guest.
>>>>>>>>>>> >
>>>>>>>>>>> > I guess QEMU could look at the value of the "MMU 
>>>>>>>>>>> > disabled/enabled" bit
>>>>>>>>>>> > in the guest's system registers, and refuse migration if it's 
>>>>>>>>>>> > off...
>>>>>>>>>>> >
>>>>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>>>>>>>>>> > of the stick about how thin the ice is in the period before the
>>>>>>>>>>> > guest turns on its MMU...)
>>>>>>>>>>>
>>>>>>>>>>> Once MMU and caches are on, we should be in a reasonable place for 
>>>>>>>>>>> QEMU
>>>>>>>>>

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-02-01 Thread Christoffer Dall

On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 31 January 2018 at 19:12, Christoffer Dall
> <christoffer.d...@linaro.org> wrote:
>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
>> <ard.biesheu...@linaro.org> wrote:
>>> On 31 January 2018 at 17:39, Christoffer Dall
>>> <christoffer.d...@linaro.org> wrote:
>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
>>>> <ard.biesheu...@linaro.org> wrote:
>>>>> On 31 January 2018 at 16:53, Christoffer Dall
>>>>> <christoffer.d...@linaro.org> wrote:
>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>>>>>> <ard.biesheu...@linaro.org> wrote:
>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall
>>>>>>> <christoffer.d...@linaro.org> wrote:
>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
>>>>>>>>> > <dgilb...@redhat.com> wrote:
>>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>>>>>>>> >>> <dgilb...@redhat.com> wrote:
>>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>>>>>> >>>>> I think the correct fix here is that your test code should turn
>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't 
>>>>>>>>> >>>>> work
>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video 
>>>>>>>>> >>>>> memory
>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it 
>>>>>>>>> >>>>> as
>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong point 
>>>>>>>>> >>>> during
>>>>>>>>> >>>> a VM boot?
>>>>>>>>> >>>
>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>>>>>>>> >>> tried to provoke that problem...
>>>>>>>>> >>
>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to 
>>>>>>>>> >> fail
>>>>>>>>> >> the migration if you can detect the cache is disabled in the guest.
>>>>>>>>> >
>>>>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" 
>>>>>>>>> > bit
>>>>>>>>> > in the guest's system registers, and refuse migration if it's off...
>>>>>>>>> >
>>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>>>>>>>> > of the stick about how thin the ice is in the period before the
>>>>>>>>> > guest turns on its MMU...)
>>>>>>>>>
>>>>>>>>> Once MMU and caches are on, we should be in a reasonable place for 
>>>>>>>>> QEMU
>>>>>>>>> to have a consistent view of the memory. The trick is to prevent the
>>>>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>>>>>>>> any given time if it needs to (and it is actually required on some HW 
>>>>>>>>> if
>>>>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>>>>>>>
>>>>>>>>
>>>>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's
>>>>>>>> system registers, but we don't trap to KVM when the VCPU turns off its
>>>>>>>> caches, right?)
>>>>>>>>
&g

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-01-31 Thread Christoffer Dall

On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 31 January 2018 at 17:39, Christoffer Dall
> <christoffer.d...@linaro.org> wrote:
>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
>> <ard.biesheu...@linaro.org> wrote:
>>> On 31 January 2018 at 16:53, Christoffer Dall
>>> <christoffer.d...@linaro.org> wrote:
>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>>>> <ard.biesheu...@linaro.org> wrote:
>>>>> On 31 January 2018 at 09:53, Christoffer Dall
>>>>> <christoffer.d...@linaro.org> wrote:
>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
>>>>>>> > <dgilb...@redhat.com> wrote:
>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>>>>>> >>> <dgilb...@redhat.com> wrote:
>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>>>> >>>>> I think the correct fix here is that your test code should turn
>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video 
>>>>>>> >>>>> memory
>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
>>>>>>> >>>>
>>>>>>> >>>> Does this cause problems with migrating at just the wrong point 
>>>>>>> >>>> during
>>>>>>> >>>> a VM boot?
>>>>>>> >>>
>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>>>>>> >>> tried to provoke that problem...
>>>>>>> >>
>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to 
>>>>>>> >> fail
>>>>>>> >> the migration if you can detect the cache is disabled in the guest.
>>>>>>> >
>>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
>>>>>>> > in the guest's system registers, and refuse migration if it's off...
>>>>>>> >
>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>>>>>> > of the stick about how thin the ice is in the period before the
>>>>>>> > guest turns on its MMU...)
>>>>>>>
>>>>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
>>>>>>> to have a consistent view of the memory. The trick is to prevent the
>>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>>>>>> any given time if it needs to (and it is actually required on some HW if
>>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>>>>>
>>>>>>
>>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's
>>>>>> system registers, but we don't trap to KVM when the VCPU turns off its
>>>>>> caches, right?)
>>>>>>
>>>>>>> You may have to pause the vcpus before starting the migration, or
>>>>>>> introduce a new KVM feature that would automatically pause a vcpu that
>>>>>>> is trying to disable its MMU while the migration is on. This would
>>>>>>> involve trapping all the virtual memory related system registers, with
>>>>>>> an obvious cost. But that cost would be limited to the time it takes to
>>>>>>> migrate the memory, so maybe that's acceptable.
>>>>>>>
>>>>>> Is that even sufficient?
>>>>>>
>>>>>> What if the following happened. (1) guest turns off MMU, (2) guest
>>>>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>>>>&g

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-01-31 Thread Christoffer Dall

On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 31 January 2018 at 16:53, Christoffer Dall
> <christoffer.d...@linaro.org> wrote:
>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>> <ard.biesheu...@linaro.org> wrote:
>>> On 31 January 2018 at 09:53, Christoffer Dall
>>> <christoffer.d...@linaro.org> wrote:
>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
>>>>> On 29/01/18 10:04, Peter Maydell wrote:
>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
>>>>> > <dgilb...@redhat.com> wrote:
>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>>>> >>> <dgilb...@redhat.com> wrote:
>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>>> >>>>> I think the correct fix here is that your test code should turn
>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video memory
>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
>>>>> >>>>> Normal Cacheable, and then everything should work fine.
>>>>> >>>>
>>>>> >>>> Does this cause problems with migrating at just the wrong point 
>>>>> >>>> during
>>>>> >>>> a VM boot?
>>>>> >>>
>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>>>> >>> tried to provoke that problem...
>>>>> >>
>>>>> >> If you think it'll get the RAM contents wrong, it might be best to fail
>>>>> >> the migration if you can detect the cache is disabled in the guest.
>>>>> >
>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
>>>>> > in the guest's system registers, and refuse migration if it's off...
>>>>> >
>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>>>> > of the stick about how thin the ice is in the period before the
>>>>> > guest turns on its MMU...)
>>>>>
>>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
>>>>> to have a consistent view of the memory. The trick is to prevent the
>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>>>> any given time if it needs to (and it is actually required on some HW if
>>>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>>>
>>>>
>>>> (Clarification: KVM can detect this is it bother to check the VCPU's
>>>> system registers, but we don't trap to KVM when the VCPU turns off its
>>>> caches, right?)
>>>>
>>>>> You may have to pause the vcpus before starting the migration, or
>>>>> introduce a new KVM feature that would automatically pause a vcpu that
>>>>> is trying to disable its MMU while the migration is on. This would
>>>>> involve trapping all the virtual memory related system registers, with
>>>>> an obvious cost. But that cost would be limited to the time it takes to
>>>>> migrate the memory, so maybe that's acceptable.
>>>>>
>>>> Is that even sufficient?
>>>>
>>>> What if the following happened. (1) guest turns off MMU, (2) guest
>>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>>>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
>>>> incoherent, ...).
>>>>
>>>> I'm also not really sure if pausing one VCPU because it turned off its
>>>> MMU will go very well when trying to migrate a large VM (wouldn't this
>>>> ask for all the other VCPUs beginning to complain that the stopped VCPU
>>>> appears to be dead?).  As a short-term 'fix' it's probably better to
>>>> refuse migration if you detect that a VCPU had begun turning off its
>>>> MMU.
>>>>
>>>> On the larger scale of thins; this appears to me to be another case of
>>>> us really needing some way to coherently access memo

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-01-31 Thread Christoffer Dall

On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 31 January 2018 at 09:53, Christoffer Dall
> <christoffer.d...@linaro.org> wrote:
>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
>>> On 29/01/18 10:04, Peter Maydell wrote:
>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert <dgilb...@redhat.com> 
>>> > wrote:
>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>> >>> <dgilb...@redhat.com> wrote:
>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>> >>>>> I think the correct fix here is that your test code should turn
>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
>>> >>>>> for Arm KVM guests (for the same reason that VGA device video memory
>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
>>> >>>>> Normal Cacheable, and then everything should work fine.
>>> >>>>
>>> >>>> Does this cause problems with migrating at just the wrong point during
>>> >>>> a VM boot?
>>> >>>
>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>> >>> tried to provoke that problem...
>>> >>
>>> >> If you think it'll get the RAM contents wrong, it might be best to fail
>>> >> the migration if you can detect the cache is disabled in the guest.
>>> >
>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
>>> > in the guest's system registers, and refuse migration if it's off...
>>> >
>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>> > of the stick about how thin the ice is in the period before the
>>> > guest turns on its MMU...)
>>>
>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
>>> to have a consistent view of the memory. The trick is to prevent the
>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>> any given time if it needs to (and it is actually required on some HW if
>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>
>>
>> (Clarification: KVM can detect this is it bother to check the VCPU's
>> system registers, but we don't trap to KVM when the VCPU turns off its
>> caches, right?)
>>
>>> You may have to pause the vcpus before starting the migration, or
>>> introduce a new KVM feature that would automatically pause a vcpu that
>>> is trying to disable its MMU while the migration is on. This would
>>> involve trapping all the virtual memory related system registers, with
>>> an obvious cost. But that cost would be limited to the time it takes to
>>> migrate the memory, so maybe that's acceptable.
>>>
>> Is that even sufficient?
>>
>> What if the following happened. (1) guest turns off MMU, (2) guest
>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
>> incoherent, ...).
>>
>> I'm also not really sure if pausing one VCPU because it turned off its
>> MMU will go very well when trying to migrate a large VM (wouldn't this
>> ask for all the other VCPUs beginning to complain that the stopped VCPU
>> appears to be dead?).  As a short-term 'fix' it's probably better to
>> refuse migration if you detect that a VCPU had begun turning off its
>> MMU.
>>
>> On the larger scale of thins; this appears to me to be another case of
>> us really needing some way to coherently access memory between QEMU and
>> the VM, but in the case of the VCPU turning off the MMU prior to
>> migration, we don't even know where it may have written data, and I'm
>> therefore not really sure what the 'proper' solution would be.
>>
>> (cc'ing Ard who has has thought about this problem before in the context
>> of UEFI and VGA.)
>>
>
> Actually, the VGA case is much simpler because the host is not
> expected to write to the framebuffer, only read from it, and the guest
> is not expected to create a cacheable mapping for it, so any
> incoherency can be trivially solved by cache invalidation on the host
> side. (Note that this has nothing to do with DMA coherency, but only
> with PCI MMIO BARs that are backed by DRAM in t

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-01-31 Thread Christoffer Dall

On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
> On 29/01/18 10:04, Peter Maydell wrote:
> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert  
> > wrote:
> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert  
> >>> wrote:
>  * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > I think the correct fix here is that your test code should turn
> > its MMU on. Trying to treat guest RAM as uncacheable doesn't work
> > for Arm KVM guests (for the same reason that VGA device video memory
> > doesn't work). If it's RAM your guest has to arrange to map it as
> > Normal Cacheable, and then everything should work fine.
> 
>  Does this cause problems with migrating at just the wrong point during
>  a VM boot?
> >>>
> >>> It wouldn't surprise me if it did, but I don't think I've ever
> >>> tried to provoke that problem...
> >>
> >> If you think it'll get the RAM contents wrong, it might be best to fail
> >> the migration if you can detect the cache is disabled in the guest.
> > 
> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
> > in the guest's system registers, and refuse migration if it's off...
> > 
> > (cc'd Marc, Christoffer to check that I don't have the wrong end
> > of the stick about how thin the ice is in the period before the
> > guest turns on its MMU...)
> 
> Once MMU and caches are on, we should be in a reasonable place for QEMU
> to have a consistent view of the memory. The trick is to prevent the
> vcpus from changing that. A guest could perfectly turn off its MMU at
> any given time if it needs to (and it is actually required on some HW if
> you want to mitigate headlining CVEs), and KVM won't know about that.
> 

(Clarification: KVM can detect this is it bother to check the VCPU's
system registers, but we don't trap to KVM when the VCPU turns off its
caches, right?)

> You may have to pause the vcpus before starting the migration, or
> introduce a new KVM feature that would automatically pause a vcpu that
> is trying to disable its MMU while the migration is on. This would
> involve trapping all the virtual memory related system registers, with
> an obvious cost. But that cost would be limited to the time it takes to
> migrate the memory, so maybe that's acceptable.
> 
Is that even sufficient?

What if the following happened. (1) guest turns off MMU, (2) guest
writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
guest ram.  QEMU's view of guest ram is now incorrect (stale,
incoherent, ...).

I'm also not really sure if pausing one VCPU because it turned off its
MMU will go very well when trying to migrate a large VM (wouldn't this
ask for all the other VCPUs beginning to complain that the stopped VCPU
appears to be dead?).  As a short-term 'fix' it's probably better to
refuse migration if you detect that a VCPU had begun turning off its
MMU.

On the larger scale of thins; this appears to me to be another case of
us really needing some way to coherently access memory between QEMU and
the VM, but in the case of the VCPU turning off the MMU prior to
migration, we don't even know where it may have written data, and I'm
therefore not really sure what the 'proper' solution would be.

(cc'ing Ard who has has thought about this problem before in the context
of UEFI and VGA.)

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH v2 1/4] hw/arm/virt: add pmu interrupt state

2017-07-21 Thread Christoffer Dall

On Fri, Jul 21, 2017 at 01:35:46PM +0200, Andrew Jones wrote:
> On Fri, Jul 21, 2017 at 01:16:07PM +0200, Christoffer Dall wrote:
> > On Wed, Jul 19, 2017 at 09:39:54AM -0400, Andrew Jones wrote:
> > > Mimicking gicv3-maintenance-interrupt, add the PMU's interrupt to
> > > CPU state.
> > > 
> > > Signed-off-by: Andrew Jones <drjo...@redhat.com>
> > > Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> > > ---
> > >  hw/arm/virt.c| 3 +++
> > >  target/arm/cpu.c | 2 ++
> > >  target/arm/cpu.h | 2 ++
> > >  3 files changed, 7 insertions(+)
> > > 
> > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > index 31739d75a3e0..ea26f0c473c2 100644
> > > --- a/hw/arm/virt.c
> > > +++ b/hw/arm/virt.c
> > > @@ -610,6 +610,9 @@ static void create_gic(VirtMachineState *vms, 
> > > qemu_irq *pic)
> > >  qdev_connect_gpio_out_named(cpudev, 
> > > "gicv3-maintenance-interrupt", 0,
> > >  qdev_get_gpio_in(gicdev, ppibase
> > >   + 
> > > ARCH_GICV3_MAINT_IRQ));
> > > +qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
> > > +qdev_get_gpio_in(gicdev, ppibase
> > > + + VIRTUAL_PMU_IRQ));
> > 
> > I know Peter reviewed this, but isn't it a bit strange to create the
> > pmu-interrupt when creating the gic (as this isn't an output from the
> > GIC like the maintenance interrupt is) ?
> >
> 
> Above the gic maintenance interrupt connection the timer irqs are also
> connected. So, while the function name implies we only create the gic,
> its function appears to be both its creation and the wiring up of CPU
> inputs and outputs.
> 

ok, I didn't see that.

Otherwise this patch looks good to me.

-Christoffer

Re: [Qemu-devel] [PATCH v2 4/4] target/arm/kvm: pmu: improve error handling

2017-07-21 Thread Christoffer Dall

On Wed, Jul 19, 2017 at 09:39:57AM -0400, Andrew Jones wrote:
> If a KVM PMU init or set-irq attr call fails we just silently stop
> the PMU DT node generation. The only way they could fail, though,
> is if the attr's respective KVM has-attr call fails. But that should
> never happen if KVM advertises the PMU capability, because both
> attrs have been available since the capability was introduced. Let's
> just abort if this should-never-happen stuff does happen, because,
> if it does, then something is obviously horribly wrong.
> 
> Signed-off-by: Andrew Jones <drjo...@redhat.com>

Reviewed-by: Christoffer Dall <cd...@linaro.org>

> ---
>  hw/arm/virt.c|  9 +++--
>  target/arm/kvm32.c   |  3 +--
>  target/arm/kvm64.c   | 28 
>  target/arm/kvm_arm.h | 15 ---
>  4 files changed, 28 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index a215330444da..4bc50964d52b 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -496,13 +496,10 @@ static void fdt_add_pmu_nodes(const VirtMachineState 
> *vms)
>  return;
>  }
>  if (kvm_enabled()) {
> -if (kvm_irqchip_in_kernel() &&
> -!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) {
> -return;
> -}
> -if (!kvm_arm_pmu_init(cpu)) {
> -return;
> +if (kvm_irqchip_in_kernel()) {
> +kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ));
>  }
> +kvm_arm_pmu_init(cpu);
>  }
>  }
>  
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index e3aab89a1a94..717a2562670b 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -522,10 +522,9 @@ bool kvm_arm_hw_debug_active(CPUState *cs)
>  return false;
>  }
>  
> -int kvm_arm_pmu_set_irq(CPUState *cs, int irq)
> +void kvm_arm_pmu_set_irq(CPUState *cs, int irq)
>  {
>  qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return 0;
>  }
>  
>  int kvm_arm_pmu_init(CPUState *cs)
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index ec7d85314acc..6554c30007a4 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -387,30 +387,36 @@ static bool kvm_arm_pmu_set_attr(CPUState *cs, struct 
> kvm_device_attr *attr)
>  
>  err = kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr);
>  if (err != 0) {
> +error_report("PMU: KVM_HAS_DEVICE_ATTR: %s", strerror(-err));
>  return false;
>  }
>  
>  err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, attr);
> -if (err < 0) {
> -fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
> -strerror(-err));
> -abort();
> +if (err != 0) {
> +error_report("PMU: KVM_SET_DEVICE_ATTR: %s", strerror(-err));
> +return false;
>  }
>  
>  return true;
>  }
>  
> -int kvm_arm_pmu_init(CPUState *cs)
> +void kvm_arm_pmu_init(CPUState *cs)
>  {
>  struct kvm_device_attr attr = {
>  .group = KVM_ARM_VCPU_PMU_V3_CTRL,
>  .attr = KVM_ARM_VCPU_PMU_V3_INIT,
>  };
>  
> -return kvm_arm_pmu_set_attr(cs, );
> +if (!ARM_CPU(cs)->has_pmu) {
> +return;
> +}
> +if (!kvm_arm_pmu_set_attr(cs, )) {
> +error_report("failed to init PMU");
> +abort();
> +}
>  }
>  
> -int kvm_arm_pmu_set_irq(CPUState *cs, int irq)
> +void kvm_arm_pmu_set_irq(CPUState *cs, int irq)
>  {
>  struct kvm_device_attr attr = {
>  .group = KVM_ARM_VCPU_PMU_V3_CTRL,
> @@ -418,7 +424,13 @@ int kvm_arm_pmu_set_irq(CPUState *cs, int irq)
>  .attr = KVM_ARM_VCPU_PMU_V3_IRQ,
>  };
>  
> -return kvm_arm_pmu_set_attr(cs, );
> +if (!ARM_CPU(cs)->has_pmu) {
> +return;
> +}
> +if (!kvm_arm_pmu_set_attr(cs, )) {
> +error_report("failed to set irq for PMU");
> +abort();
> +}
>  }
>  
>  static inline void set_feature(uint64_t *features, int feature)
> diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> index cab5ea9be55c..ff53e9fafb7a 100644
> --- a/target/arm/kvm_arm.h
> +++ b/target/arm/kvm_arm.h
> @@ -195,8 +195,8 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu);
>  
>  int kvm_arm_vgic_probe(void);
>  
> -int kvm_arm_pmu_set_irq(CPUState *cs, int irq);
> -int kvm_arm_pmu_init(CPUState *cs);
> +void kvm_arm_pmu_set_irq(CPUState *cs, int irq);
> +void kvm_arm_pmu_init(CPUState *cs);
>  
>  #else
>  
> @@ -205,15 +205,8 @@ static inline int kvm_arm_vgic_probe(void)
>  return 0;
>  }
>  
> -static inline int kvm_arm_pmu_set_irq(CPUState *cs, int irq)
> -{
> -return 0;
> -}
> -
> -static inline int kvm_arm_pmu_init(CPUState *cs)
> -{
> -return 0;
> -}
> +static inline void kvm_arm_pmu_set_irq(CPUState *cs, int irq) {}
> +static inline void kvm_arm_pmu_init(CPUState *cs) {}
>  
>  #endif
>  
> -- 
> 1.8.3.1
>

Re: [Qemu-devel] [PATCH v2 3/4] hw/arm/virt: allow pmu instantiation with userspace irqchip

2017-07-21 Thread Christoffer Dall

On Wed, Jul 19, 2017 at 09:39:56AM -0400, Andrew Jones wrote:
> Move the in-kernel-irqchip test to only guard the set-irq
> stage, not the init stage of the PMU.  Also add the PMU to
> the KVM device irq line synchronization to enable its use.
> 
> Signed-off-by: Andrew Jones <drjo...@redhat.com>

Reviewed-by: Christoffer Dall <cd...@linaro.org>

> ---
>  hw/arm/virt.c  | 3 ++-
>  target/arm/kvm.c   | 6 +-
>  target/arm/kvm64.c | 3 +--
>  3 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 7157a028adce..a215330444da 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -496,7 +496,8 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms)
>  return;
>  }
>  if (kvm_enabled()) {
> -if (!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) {
> +if (kvm_irqchip_in_kernel() &&
> +!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) {
>  return;
>  }
>  if (!kvm_arm_pmu_init(cpu)) {
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 7c17f0d629d7..211a7bf7befd 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -567,7 +567,11 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct 
> kvm_run *run)
>  switched_level &= ~KVM_ARM_DEV_EL1_PTIMER;
>  }
>  
> -/* XXX PMU IRQ is missing */
> +if (switched_level & KVM_ARM_DEV_PMU) {
> +qemu_set_irq(cpu->pmu_interrupt,
> + !!(run->s.regs.device_irq_level & KVM_ARM_DEV_PMU));
> +switched_level &= ~KVM_ARM_DEV_PMU;
> +}
>  
>  if (switched_level) {
>  qemu_log_mask(LOG_UNIMP, "%s: unhandled in-kernel device IRQ 
> %x\n",
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index e26638a6fac1..ec7d85314acc 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -506,8 +506,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  if (!arm_feature(>env, ARM_FEATURE_AARCH64)) {
>  cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
>  }
> -if (!kvm_irqchip_in_kernel() ||
> -!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
> +if (!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
>  cpu->has_pmu = false;
>  }
>  if (cpu->has_pmu) {
> -- 
> 1.8.3.1
>

Re: [Qemu-devel] [PATCH v2 2/4] target/arm/kvm: pmu: split init and set-irq stages

2017-07-21 Thread Christoffer Dall

On Wed, Jul 19, 2017 at 09:39:55AM -0400, Andrew Jones wrote:
> When adding a PMU with a userspace irqchip we skip the set-irq
> stage of device creation. Split the 'create' function into two
> functions 'init' and 'set-irq' so they may be called separately.
> 
> Signed-off-by: Andrew Jones <drjo...@redhat.com>

Reviewed-by: Christoffer Dall <cd...@linaro.org>

> ---
>  hw/arm/virt.c| 11 +--
>  target/arm/kvm32.c   |  8 +++-
>  target/arm/kvm64.c   | 52 
> +---
>  target/arm/kvm_arm.h | 10 --
>  4 files changed, 49 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ea26f0c473c2..7157a028adce 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -492,10 +492,17 @@ static void fdt_add_pmu_nodes(const VirtMachineState 
> *vms)
>  
>  CPU_FOREACH(cpu) {
>  armcpu = ARM_CPU(cpu);
> -if (!arm_feature(>env, ARM_FEATURE_PMU) ||
> -(kvm_enabled() && !kvm_arm_pmu_create(cpu, 
> PPI(VIRTUAL_PMU_IRQ {
> +if (!arm_feature(>env, ARM_FEATURE_PMU)) {
>  return;
>  }
> +if (kvm_enabled()) {
> +if (!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) {
> +return;
> +}
> +if (!kvm_arm_pmu_init(cpu)) {
> +return;
> +}
> +}
>  }
>  
>  if (vms->gic_version == 2) {
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index 069da0c5fd10..e3aab89a1a94 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -522,7 +522,13 @@ bool kvm_arm_hw_debug_active(CPUState *cs)
>  return false;
>  }
>  
> -int kvm_arm_pmu_create(CPUState *cs, int irq)
> +int kvm_arm_pmu_set_irq(CPUState *cs, int irq)
> +{
> +qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> +return 0;
> +}
> +
> +int kvm_arm_pmu_init(CPUState *cs)
>  {
>  qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
>  return 0;
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index a16abc8d129e..e26638a6fac1 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -381,46 +381,44 @@ static CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, 
> target_ulong addr)
>  return NULL;
>  }
>  
> -static bool kvm_arm_pmu_support_ctrl(CPUState *cs, struct kvm_device_attr 
> *attr)
> -{
> -return kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr) == 0;
> -}
> -
> -int kvm_arm_pmu_create(CPUState *cs, int irq)
> +static bool kvm_arm_pmu_set_attr(CPUState *cs, struct kvm_device_attr *attr)
>  {
>  int err;
>  
> -struct kvm_device_attr attr = {
> -.group = KVM_ARM_VCPU_PMU_V3_CTRL,
> -.addr = (intptr_t),
> -.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
> -.flags = 0,
> -};
> -
> -if (!kvm_arm_pmu_support_ctrl(cs, )) {
> -return 0;
> +err = kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr);
> +if (err != 0) {
> +return false;
>  }
>  
> -err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, );
> +err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, attr);
>  if (err < 0) {
>  fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
>  strerror(-err));
>  abort();
>  }
>  
> -attr.group = KVM_ARM_VCPU_PMU_V3_CTRL;
> -attr.attr = KVM_ARM_VCPU_PMU_V3_INIT;
> -attr.addr = 0;
> -attr.flags = 0;
> +return true;
> +}
>  
> -err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, );
> -if (err < 0) {
> -fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n",
> -strerror(-err));
> -abort();
> -}
> +int kvm_arm_pmu_init(CPUState *cs)
> +{
> +struct kvm_device_attr attr = {
> +.group = KVM_ARM_VCPU_PMU_V3_CTRL,
> +.attr = KVM_ARM_VCPU_PMU_V3_INIT,
> +};
> +
> +return kvm_arm_pmu_set_attr(cs, );
> +}
> +
> +int kvm_arm_pmu_set_irq(CPUState *cs, int irq)
> +{
> +struct kvm_device_attr attr = {
> +.group = KVM_ARM_VCPU_PMU_V3_CTRL,
> +.addr = (intptr_t),
> +.attr = KVM_ARM_VCPU_PMU_V3_IRQ,
> +};
>  
> -return 1;
> +return kvm_arm_pmu_set_attr(cs, );
>  }
>  
>  static inline void set_feature(uint64_t *features, int feature)
> diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> index 633d08828a5d..cab5ea9be55c 100644
> --- a/target/arm/kvm_arm.h
> +++ b/target/arm/kvm_arm.h
> @@ -195,7 +195,8 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu);
>  
>  int kvm_a

Re: [Qemu-devel] [PATCH v2 1/4] hw/arm/virt: add pmu interrupt state

2017-07-21 Thread Christoffer Dall

On Wed, Jul 19, 2017 at 09:39:54AM -0400, Andrew Jones wrote:
> Mimicking gicv3-maintenance-interrupt, add the PMU's interrupt to
> CPU state.
> 
> Signed-off-by: Andrew Jones 
> Reviewed-by: Peter Maydell 
> ---
>  hw/arm/virt.c| 3 +++
>  target/arm/cpu.c | 2 ++
>  target/arm/cpu.h | 2 ++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 31739d75a3e0..ea26f0c473c2 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -610,6 +610,9 @@ static void create_gic(VirtMachineState *vms, qemu_irq 
> *pic)
>  qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0,
>  qdev_get_gpio_in(gicdev, ppibase
>   + 
> ARCH_GICV3_MAINT_IRQ));
> +qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
> +qdev_get_gpio_in(gicdev, ppibase
> + + VIRTUAL_PMU_IRQ));

I know Peter reviewed this, but isn't it a bit strange to create the
pmu-interrupt when creating the gic (as this isn't an output from the
GIC like the maintenance interrupt is) ?

Thanks,
-Christoffer

>  
>  sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, 
> ARM_CPU_IRQ));
>  sysbus_connect_irq(gicbusdev, i + smp_cpus,
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 96d1f840301f..fd82c7944840 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -499,6 +499,8 @@ static void arm_cpu_initfn(Object *obj)
>  
>  qdev_init_gpio_out_named(DEVICE(cpu), >gicv3_maintenance_interrupt,
>   "gicv3-maintenance-interrupt", 1);
> +qdev_init_gpio_out_named(DEVICE(cpu), >pmu_interrupt,
> + "pmu-interrupt", 1);
>  #endif
>  
>  /* DTB consumers generally don't in fact care what the 'compatible'
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 102c58afac52..8d91166eb97b 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -584,6 +584,8 @@ struct ARMCPU {
>  qemu_irq gt_timer_outputs[NUM_GTIMERS];
>  /* GPIO output for GICv3 maintenance interrupt signal */
>  qemu_irq gicv3_maintenance_interrupt;
> +/* GPIO output for the PMU interrupt */
> +qemu_irq pmu_interrupt;
>  
>  /* MemoryRegion to use for secure physical accesses */
>  MemoryRegion *secure_memory;
> -- 
> 1.8.3.1
>

[Qemu-devel] [job ad] QEMU Engineering Position with Linaro

2017-05-29 Thread Christoffer Dall

Hi all,

Linaro is looking for a full-time engineer to work on QEMU, primarily
focused on emulating aspects of the ARM architecture.

Feel free to apply to the position or forward this to anyone you know
who has experience working with QEMU and experience with modeling
computer architecture.

The official job opening is posted here:
http://www.linaro.org/careers/#op-22576-qemu-engineer-

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH v3 2/3] arm64: kvm: inject SError with virtual syndrome

2017-05-02 Thread Christoffer Dall

On Sun, Apr 30, 2017 at 01:37:56PM +0800, Dongjiu Geng wrote:
> when SError happen, kvm notifies kvmtool to generate GHES table
> to record the error, then kvmtools inject the SError with specified

again, is this really specific to kvmtool?  Pleae try to explain this
mechanism in generic terms.

> virtual syndrome. when switch to guest, a virtual SError will happen with
> this specified syndrome.
> 
> Signed-off-by: Dongjiu Geng 
> ---
>  arch/arm64/include/asm/esr.h |  2 ++
>  arch/arm64/include/asm/kvm_emulate.h | 10 ++
>  arch/arm64/include/asm/kvm_host.h|  1 +
>  arch/arm64/include/asm/sysreg.h  |  3 +++
>  arch/arm64/kvm/handle_exit.c | 25 +++--
>  arch/arm64/kvm/hyp/switch.c  | 15 ++-
>  include/uapi/linux/kvm.h |  5 +
>  7 files changed, 54 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 22f9c90..d009c99 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -127,6 +127,8 @@
>  #define ESR_ELx_WFx_ISS_WFE  (UL(1) << 0)
>  #define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
>  
> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
> +
>  /* ESR value templates for specific events */
>  
>  /* BRK instruction trap from AArch64 state */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index f5ea0ba..a3259a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -148,6 +148,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu 
> *vcpu)
>   return vcpu->arch.fault.esr_el2;
>  }
>  
> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.fault.vsesr_el2;
> +}
> +
> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long 
> val)
> +{
> + vcpu->arch.fault.vsesr_el2 = val;
> +}
> +
>  static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>  {
>   u32 esr = kvm_vcpu_get_hsr(vcpu);
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e7705e7..84ed239 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info {
>   u32 esr_el2;/* Hyp Syndrom Register */
>   u64 far_el2;/* Hyp Fault Address Register */
>   u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
> + u32 vsesr_el2;  /* Virtual SError Exception Syndrome Register */
>  };
>  
>  /*
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 32964c7..b6afb7a 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -125,6 +125,9 @@
>  #define REG_PSTATE_PAN_IMM   sys_reg(0, 0, 4, 0, 4)
>  #define REG_PSTATE_UAO_IMM   sys_reg(0, 0, 4, 0, 3)
>  
> +#define VSESR_EL2sys_reg(3, 4, 5, 2, 3)
> +
> +
>  #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM |  
> \
> (!!x)<<8 | 0x1f)
>  #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM |  
> \
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index c89d83a..3d024a9 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -180,7 +180,11 @@ static exit_handle_fn kvm_get_exit_handler(struct 
> kvm_vcpu *vcpu)
>  
>  static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> - unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> + unsigned long hva, fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> + struct kvm_memory_slot *memslot;
> + int hsr, ret = 1;
> + bool writable;
> + gfn_t gfn;
>  
>   if (handle_guest_sei((unsigned long)fault_ipa,
>   kvm_vcpu_get_hsr(vcpu))) {
> @@ -190,9 +194,20 @@ static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   (unsigned long)kvm_vcpu_get_hsr(vcpu));
>  
>   kvm_inject_vabt(vcpu);
> + } else {
> + hsr = kvm_vcpu_get_hsr(vcpu);
> +
> + gfn = fault_ipa >> PAGE_SHIFT;
> + memslot = gfn_to_memslot(vcpu->kvm, gfn);
> + hva = gfn_to_hva_memslot_prot(memslot, gfn, );
> +
> + run->exit_reason = KVM_EXIT_INTR;
> + run->intr.syndrome_info = hsr;
> + run->intr.address = hva;
> + ret = 0;
>   }
>  
> - return 0;
> + return ret;
>  }
>  
>  /*
> @@ -218,8 +233,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run 
> *run,
>   *vcpu_pc(vcpu) -= adj;
>   }
>  
> - kvm_handle_guest_sei(vcpu, run);
> - return 1;
> + return kvm_handle_guest_sei(vcpu,

Re: [Qemu-devel] [PATCH v3 1/3] arm64: kvm: support kvmtool to detect RAS extension feature

2017-05-02 Thread Christoffer Dall

Hi Dongjiu,

Please send a cover letter for patch series with more than a single
patch.

The subject and description of these patches are also misleading.
Hopefully this is in no way tied to kvmtool, but to userspace
generically, for example also to be used by QEMU?

On Sun, Apr 30, 2017 at 01:37:55PM +0800, Dongjiu Geng wrote:
> Handle kvmtool's detection for RAS extension, because sometimes
> the APP needs to know the CPU's capacity

the APP ?

the CPU's capacity?

> 
> Signed-off-by: Dongjiu Geng 
> ---
>  arch/arm64/kvm/reset.c   | 11 +++
>  include/uapi/linux/kvm.h |  1 +
>  2 files changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index d9e9697..1004039 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -64,6 +64,14 @@ static bool cpu_has_32bit_el1(void)
>   return !!(pfr0 & 0x20);
>  }
>  
> +static bool kvm_arm_support_ras_extension(void)
> +{
> + u64 pfr0;
> +
> + pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
> + return !!(pfr0 & 0x1000);
> +}
> +
>  /**
>   * kvm_arch_dev_ioctl_check_extension
>   *
> @@ -87,6 +95,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
> long ext)
>   case KVM_CAP_ARM_PMU_V3:
>   r = kvm_arm_support_pmu_v3();
>   break;
> + case KVM_CAP_ARM_RAS_EXTENSION:
> + r = kvm_arm_support_ras_extension();
> + break;

You need to document this capability and API in
Documentation/virtual/kvm/api.txt and explain how this works.



>   case KVM_CAP_SET_GUEST_DEBUG:
>   case KVM_CAP_VCPU_ATTRIBUTES:
>   r = 1;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f51d508..27fe556 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_PPC_MMU_RADIX 134
>  #define KVM_CAP_PPC_MMU_HASH_V3 135
>  #define KVM_CAP_IMMEDIATE_EXIT 136
> +#define KVM_CAP_ARM_RAS_EXTENSION 137
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.10.1
> 

Thanks,
-Christoffer

Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-29 Thread Christoffer Dall

On Wed, Mar 29, 2017 at 01:51:19PM -0700, Radha Mohan wrote:
> On Wed, Mar 29, 2017 at 11:56 AM, Christoffer Dall <cd...@linaro.org> wrote:
> > On Tue, Mar 28, 2017 at 01:24:15PM -0700, Radha Mohan wrote:
> >> On Tue, Mar 28, 2017 at 1:16 PM, Christoffer Dall <cd...@linaro.org> wrote:
> >> > Hi Radha,
> >> >
> >> > On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote:
> >> >> Hi,
> >> >> I am seeing an issue with qemu-system-aarch64 when using pflash
> >> >> (booting kernel via UEFI bios).
> >> >>
> >> >> Host kernel: 4.11.0-rc3-next-20170323
> >> >> Qemu version: v2.9.0-rc1
> >> >>
> >> >> Command used:
> >> >> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
> >> >> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
> >> >> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
> >> >> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
> >> >> /root/flash1.img
> >> >>
> >> >>
> >> >> As soon as the guest kernel boots the host starts to stall and prints
> >> >> the below messages. And the system never recovers. I can neither
> >> >> poweroff the guest nor the host. So I have resort to external power
> >> >> reset of the host.
> >> >>
> >> >> ==
> >> >> [  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
> >> >> [kworker/25:1:454]
> >> >> [  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
> >> >> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
> >> >> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
> >> >> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> >> >> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
> >> >> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
> >> >> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
> >> >> mdio_thunder mdio_cavium
> >> >>
> >> >> [  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
> >> >> 4.11.0-rc3-next-20170323 #1
> >> >> [  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 
> >> >> Feb 23 2017
> >> >> [  116.207010] Workqueue: events netstamp_clear
> >> >> [  116.207015] task: 801f906b5400 task.stack: 801f901a4000
> >> >> [  116.207020] PC is at smp_call_function_many+0x284/0x2e8
> >> >> [  116.207023] LR is at smp_call_function_many+0x244/0x2e8
> >> >> [  116.207026] pc : [] lr : []
> >> >> pstate: 8145
> >> >> [  116.207028] sp : 801f901a7be0
> >> >> [  116.207030] x29: 801f901a7be0 x28: 09139000
> >> >> [  116.207036] x27: 09139434 x26: 0080
> >> >> [  116.207041] x25:  x24: 081565d0
> >> >> [  116.207047] x23: 0001 x22: 08e11e00
> >> >> [  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
> >> >> [  116.207057] x19: 09138e38 x18: 0a03
> >> >> [  116.207063] x17: b77c9028 x16: 082e81d8
> >> >> [  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
> >> >> [  116.207073] x13: 58dabe4c x12: 0018
> >> >> [  116.207079] x11: 366e2f04 x10: 09f0
> >> >> [  116.207084] x9 : 801f901a7d30 x8 : 0002
> >> >> [  116.207089] x7 :  x6 : 
> >> >> [  116.207095] x5 :  x4 : 0020
> >> >> [  116.207100] x3 : 0020 x2 : 
> >> >> [  116.207105] x1 : 801f6d682578 x0 : 0003
> >> >>
> >> >> [  150.443116] INFO: rcu_sched self-detected stall on CPU
> >> >> [  150.448261]  25-...: (14997 ticks this GP)
> >> >> idle=47a/141/0 softirq=349/349 fqs=7495
> >> >> [  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
> >> >> [  150.451123]  25-...: (14997 ticks this GP)
> >> >> idle=47a/141/0 softirq=349/349 fqs=7495
> >> >> [  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
> >> >> [  150.

Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-29 Thread Christoffer Dall

On Tue, Mar 28, 2017 at 01:24:15PM -0700, Radha Mohan wrote:
> On Tue, Mar 28, 2017 at 1:16 PM, Christoffer Dall <cd...@linaro.org> wrote:
> > Hi Radha,
> >
> > On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote:
> >> Hi,
> >> I am seeing an issue with qemu-system-aarch64 when using pflash
> >> (booting kernel via UEFI bios).
> >>
> >> Host kernel: 4.11.0-rc3-next-20170323
> >> Qemu version: v2.9.0-rc1
> >>
> >> Command used:
> >> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
> >> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
> >> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
> >> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
> >> /root/flash1.img
> >>
> >>
> >> As soon as the guest kernel boots the host starts to stall and prints
> >> the below messages. And the system never recovers. I can neither
> >> poweroff the guest nor the host. So I have resort to external power
> >> reset of the host.
> >>
> >> ==
> >> [  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
> >> [kworker/25:1:454]
> >> [  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
> >> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
> >> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
> >> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> >> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
> >> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
> >> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
> >> mdio_thunder mdio_cavium
> >>
> >> [  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
> >> 4.11.0-rc3-next-20170323 #1
> >> [  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 
> >> 23 2017
> >> [  116.207010] Workqueue: events netstamp_clear
> >> [  116.207015] task: 801f906b5400 task.stack: 801f901a4000
> >> [  116.207020] PC is at smp_call_function_many+0x284/0x2e8
> >> [  116.207023] LR is at smp_call_function_many+0x244/0x2e8
> >> [  116.207026] pc : [] lr : []
> >> pstate: 8145
> >> [  116.207028] sp : 801f901a7be0
> >> [  116.207030] x29: 801f901a7be0 x28: 09139000
> >> [  116.207036] x27: 09139434 x26: 0080
> >> [  116.207041] x25:  x24: 081565d0
> >> [  116.207047] x23: 0001 x22: 08e11e00
> >> [  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
> >> [  116.207057] x19: 09138e38 x18: 0a03
> >> [  116.207063] x17: b77c9028 x16: 082e81d8
> >> [  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
> >> [  116.207073] x13: 58dabe4c x12: 0018
> >> [  116.207079] x11: 366e2f04 x10: 09f0
> >> [  116.207084] x9 : 801f901a7d30 x8 : 0002
> >> [  116.207089] x7 :  x6 : 
> >> [  116.207095] x5 :  x4 : 0020
> >> [  116.207100] x3 : 0020 x2 : 
> >> [  116.207105] x1 : 801f6d682578 x0 : 0003
> >>
> >> [  150.443116] INFO: rcu_sched self-detected stall on CPU
> >> [  150.448261]  25-...: (14997 ticks this GP)
> >> idle=47a/141/0 softirq=349/349 fqs=7495
> >> [  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
> >> [  150.451123]  25-...: (14997 ticks this GP)
> >> idle=47a/141/0 softirq=349/349 fqs=7495
> >> [  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
> >> [  150.451136] Task dump for CPU 25:
> >> [  150.451138] kworker/25:1R  running task0   454  2 
> >> 0x0002
> >> [  150.451155] Workqueue: events netstamp_clear
> >> [  150.451158] Call trace:
> >> [  150.451164] [] __switch_to+0x90/0xa8
> >> [  150.451172] [] static_key_slow_inc+0x128/0x138
> >> [  150.451175] [] static_key_enable+0x34/0x60
> >> [  150.451178] [] netstamp_clear+0x68/0x80
> >> [  150.451181] [] process_one_work+0x158/0x478
> >> [  150.451183] [] worker_thread+0x50/0x4a8
> >> [  150.451187] [] kthread+0x108/0x138
> >> [  150.451190] [] ret_from_fork+0x10/0x50
> >> [  150.477451]   (t=15008 jiffies g=805 c=804 q=8384)
> >> [  150.48224

Re: [Qemu-devel] [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread Christoffer Dall

On Wed, Mar 29, 2017 at 05:37:49PM +0200, Laszlo Ersek wrote:
> On 03/29/17 16:48, Christoffer Dall wrote:
> > On Wed, Mar 29, 2017 at 10:36:51PM +0800, gengdongjiu wrote:
> >> 2017-03-29 18:36 GMT+08:00, Achin Gupta <achin.gu...@arm.com>:
> 
> >>> Qemu is essentially fulfilling the role of secure firmware at the
> >>> EL2/EL1 interface (as discussed with Christoffer below). So it
> >>> should generate the CPER before injecting the error.
> >>>
> >>> This is corresponds to (1) above apart from notifying UEFI (I am
> >>> assuming you mean guest UEFI). At this time, the guest OS already
> >>> knows where to pick up the CPER from through the HEST. Qemu has
> >>> to create the CPER and populate its address at the address
> >>> exported in the HEST. Guest UEFI should not be involved in this 
> >>> flow. Its job was to create the HEST at boot and that has been
> >>> done by this stage.
> >>
> >> Sorry,  As I understand it, after Qemu generate the CPER table, it
> >> should pass the CPER table to the guest UEFI, then Guest UEFI  place
> >> this CPER table to the guest OS memory. In this flow, the Guest UEFI
> >> should be involved, else the Guest OS can not see the CPER table.
> >>
> > 
> > I think you need to explain the "pass the CPER table to the guest UEFI"
> > concept in terms of what really happens, step by step, and when you say
> > "then Guest UEFI place the CPER table to the guest OS memory", I'm
> > curious who is running what code on the hardware when doing that.
> 
> I strongly suggest to keep the guest firmware's runtime involvement to
> zero. Two reasons:
> 
> (1) As you explained above (... which I conveniently snipped), when you
> inject an interrupt to the guest, the handler registered for that
> interrupt will come from the guest kernel.
> 
> The only exception to this is when the platform provides a type of
> interrupt whose handler can be registered and then locked down by the
> firmware. On x86, this is the SMI.
> 
> In practice though,
> - in OVMF (x86), we only do synchronous (software-initiated) SMIs (for
> privileged UEFI varstore access),
> - and in ArmVirtQemu (ARM / aarch64), none of the management mode stuff
> exists at all.
> 
> I understand that the Platform Init 1.5 (or 1.6?) spec abstracted away
> the MM (management mode) protocols from Intel SMM, but at this point
> there is zero code in ArmVirtQemu for that. (And I'm unsure how much of
> any eligible underlying hw emulation exists in QEMU.)
> 
> So you can't get the guest firmware to react to the injected interrupt
> without the guest OS coming between first.
> 
> (2) Achin's description matches really-really closely what is possible,
> and what should be done with QEMU, ArmVirtQemu, and the guest kernel.
> 
> In any solution for this feature, the firmware has to reserve some
> memory from the OS at boot. The current facilities we have enable this.
> As I described previously, the ACPI linker/loader actions can be mapped
> more or less 1:1 to Achin's design. From a practical perspective, you
> really want to keep the guest firmware as dumb as possible (meaning: as
> generic as possible), and keep the ACPI specifics to the QEMU and the
> guest kernel sides.
> 
> The error serialization actions -- the co-operation between guest kernel
> and QEMU on the special memory areas -- that were mentioned earlier by
> Michael and Punit look like a complication. But, IMO, they don't differ
> from any other device emulation -- DMA actions in particular -- that
> QEMU already does. Device models are what QEMU *does*. Read the command
> block that the guest driver placed in guest memory, parse it, sanity
> check it, verify it, execute it, write back the status code, inject an
> interrupt (and/or let any polling guest driver notice it "soon after" --
> use barriers as necessary).
> 
> Thus, I suggest to rely on the generic ACPI linker/loader interface
> (between QEMU and guest firmware) *only* to make the firmware lay out
> stuff (= reserve buffers, set up pointers, install QEMU's ACPI tables)
> *at boot*. Then, at runtime, let the guest kernel and QEMU (the "device
> model") talk to each other directly. Keep runtime firmware involvement
> to zero.
> 
> You *really* don't want to debug three components at runtime, when you
> can solve the thing with two. (Two components whose build systems won't
> drive you mad, I should add.)
> 
> IMO, Achin's design nailed it. We can do that.
> 
I completely agree.

My questions were intended for gengdongjiu to clarify his/her position
and clear up any misunderstandings between what Achin suggested and what
he/she wrote.

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread Christoffer Dall

On Wed, Mar 29, 2017 at 10:36:51PM +0800, gengdongjiu wrote:
> Hi Achin,
>   Thanks for your mail and  answer.
> 
> 2017-03-29 18:36 GMT+08:00, Achin Gupta :
> > Hi gengdongjiu,
> >
> > On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
> >>
> >> Hi Laszlo/Biesheuvel/Qemu developer,
> >>
> >>Now I encounter a issue and want to consult with you in ARM64 platform，
> >> as described below:
> >>
> >>when guest OS happen synchronous or asynchronous abort, kvm needs to
> >> send the error address to Qemu or UEFI through sigbus to dynamically
> >> generate APEI table. from my investigation, there are two ways:
> >>
> >>(1) Qemu get the error address, and generate the APEI table, then
> >> notify UEFI to know this generation, then inject abort error to guest OS,
> >> guest OS read the APEI table.
> >>(2) Qemu get the error address, and let UEFI to generate the APEI
> >> table, then inject abort error to guest OS, guest OS read the APEI table.
> 
> The description may be not precise, I update it
> 
>(1) Qemu get the error address, and generate the CPER table, then
> notify guest UEFI to place this CPER table to Guest OS memory, then
> Qemu let KVM inject abort error to guest OS, guest OS read the CPER
> table.
> 
>  (2) Qemu get the error address, and let guest UEFI to directly
> generate the CPER
> table and place this table to the guest OS memory, not let Qemu gerate
> it. then KVM inject abort error to guest OS, guest OS read the CPER
> table.
> 

I don't understand how you are going to notify the guest UEFI instance
of anything or run the guest UEFI instance without going through the
guest kernel.

AFAIU, the guest UEFI instance is just a blob of software running at EL1
together with the guest kernel, and you're notification mechanism from
QEMU to the VM is pretty much limited to injecting a virtual interrupt
(of some kind) which is initially handled by the guest kernel.

> >
> > Just being pedantic! I don't think we are talking about creating the APEI
> > table
> > dynamically here. The issue is: Once KVM has received an error that is
> > destined
> > for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the
> > error
> > into the guest OS, a CPER (Common Platform Error Record) has to be
> > generated
> > corresponding to the error source (GHES corresponding to memory subsystem,
> > processor etc) to allow the guest OS to do anything meaningful with the
> > error. So who should create the CPER is the question.
> >
> > At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error
> > arrives
> > at EL3 and secure firmware (at EL3 or a lower secure exception level) is
> > responsible for creating the CPER. ARM is experimenting with using a
> > Standalone
> > MM EDK2 image in the secure world to do the CPER creation. This will avoid
> > adding the same code in ARM TF in EL3 (better for security). The error will
> > then
> > be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM
> > Trusted
> > Firmware.
> >
> > Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
> > interface (as discussed with Christoffer below). So it should generate the
> > CPER
> > before injecting the error.
> >
> > This is corresponds to (1) above apart from notifying UEFI (I am assuming
> > you
> > mean guest UEFI). At this time, the guest OS already knows where to pick up
> > the
> > CPER from through the HEST. Qemu has to create the CPER and populate its
> > address
> > at the address exported in the HEST. Guest UEFI should not be involved in
> > this
> > flow. Its job was to create the HEST at boot and that has been done by this
> > stage.
> 
>  Sorry,  As I understand it, after Qemu generate the CPER table, it
> should pass the CPER table to the guest UEFI, then Guest UEFI  place
> this CPER table to the guest OS memory. In this flow, the Guest UEFI
> should be involved, else the Guest OS can not see the CPER table.
> 

I think you need to explain the "pass the CPER table to the guest UEFI"
concept in terms of what really happens, step by step, and when you say
"then Guest UEFI place the CPER table to the guest OS memory", I'm
curious who is running what code on the hardware when doing that.


Thanks,
-Christoffer

Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-28 Thread Christoffer Dall

Hi Radha,

On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote:
> Hi,
> I am seeing an issue with qemu-system-aarch64 when using pflash
> (booting kernel via UEFI bios).
> 
> Host kernel: 4.11.0-rc3-next-20170323
> Qemu version: v2.9.0-rc1
> 
> Command used:
> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
> /root/flash1.img
> 
> 
> As soon as the guest kernel boots the host starts to stall and prints
> the below messages. And the system never recovers. I can neither
> poweroff the guest nor the host. So I have resort to external power
> reset of the host.
> 
> ==
> [  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
> [kworker/25:1:454]
> [  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
> mdio_thunder mdio_cavium
> 
> [  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
> 4.11.0-rc3-next-20170323 #1
> [  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 23 
> 2017
> [  116.207010] Workqueue: events netstamp_clear
> [  116.207015] task: 801f906b5400 task.stack: 801f901a4000
> [  116.207020] PC is at smp_call_function_many+0x284/0x2e8
> [  116.207023] LR is at smp_call_function_many+0x244/0x2e8
> [  116.207026] pc : [] lr : []
> pstate: 8145
> [  116.207028] sp : 801f901a7be0
> [  116.207030] x29: 801f901a7be0 x28: 09139000
> [  116.207036] x27: 09139434 x26: 0080
> [  116.207041] x25:  x24: 081565d0
> [  116.207047] x23: 0001 x22: 08e11e00
> [  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
> [  116.207057] x19: 09138e38 x18: 0a03
> [  116.207063] x17: b77c9028 x16: 082e81d8
> [  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
> [  116.207073] x13: 58dabe4c x12: 0018
> [  116.207079] x11: 366e2f04 x10: 09f0
> [  116.207084] x9 : 801f901a7d30 x8 : 0002
> [  116.207089] x7 :  x6 : 
> [  116.207095] x5 :  x4 : 0020
> [  116.207100] x3 : 0020 x2 : 
> [  116.207105] x1 : 801f6d682578 x0 : 0003
> 
> [  150.443116] INFO: rcu_sched self-detected stall on CPU
> [  150.448261]  25-...: (14997 ticks this GP)
> idle=47a/141/0 softirq=349/349 fqs=7495
> [  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  150.451123]  25-...: (14997 ticks this GP)
> idle=47a/141/0 softirq=349/349 fqs=7495
> [  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
> [  150.451136] Task dump for CPU 25:
> [  150.451138] kworker/25:1R  running task0   454  2 
> 0x0002
> [  150.451155] Workqueue: events netstamp_clear
> [  150.451158] Call trace:
> [  150.451164] [] __switch_to+0x90/0xa8
> [  150.451172] [] static_key_slow_inc+0x128/0x138
> [  150.451175] [] static_key_enable+0x34/0x60
> [  150.451178] [] netstamp_clear+0x68/0x80
> [  150.451181] [] process_one_work+0x158/0x478
> [  150.451183] [] worker_thread+0x50/0x4a8
> [  150.451187] [] kthread+0x108/0x138
> [  150.451190] [] ret_from_fork+0x10/0x50
> [  150.477451]   (t=15008 jiffies g=805 c=804 q=8384)
> [  150.482242] Task dump for CPU 25:
> [  150.482245] kworker/25:1R  running task0   454  2 
> 0x0002
> [  150.482259] Workqueue: events netstamp_clear
> [  150.482264] Call trace:
> [  150.482271] [] dump_backtrace+0x0/0x2b0
> [  150.482277] [] show_stack+0x24/0x30
> [  150.482281] [] sched_show_task+0x128/0x178
> [  150.482285] [] dump_cpu_task+0x48/0x58
> [  150.482288] [] rcu_dump_cpu_stacks+0xa0/0xe8
> [  150.482297] [] rcu_check_callbacks+0x774/0x938
> [  150.482305] [] update_process_times+0x34/0x60
> [  150.482314] [] tick_sched_handle.isra.7+0x38/0x70
> [  150.482319] [] tick_sched_timer+0x4c/0x98
> [  150.482324] [] __hrtimer_run_queues+0xd8/0x2b8
> [  150.482328] [] hrtimer_interrupt+0xa8/0x228
> [  150.482334] [] arch_timer_handler_phys+0x3c/0x50
> [  150.482341] [] handle_percpu_devid_irq+0x8c/0x230
> [  150.482344] [] generic_handle_irq+0x34/0x50
> [  150.482347] [] __handle_domain_irq+0x68/0xc0
> [  150.482351] [] gic_handle_irq+0xc4/0x170
> [  150.482356] Exception stack(0x801f901a7ab0 to 0x801f901a7be0)
> [  150.482360] 7aa0:
> 0003 801f6d682578
> [  150.482364] 7ac0:

Re: [Qemu-devel] KVM/QEMU on Raspberry Pi 3

2017-02-02 Thread Christoffer Dall

On Thu, Feb 2, 2017 at 3:59 PM, Marc Zyngier  wrote:
> [+Christoffer]
>
> Hi Pekka,
>
> On 02/02/17 14:44, Pekka Enberg wrote:
>> Hi,
>>
>> Has anyone been able to successfully run QEMU/KVM under Raspberry Pi 3?
>>
>> I have installed 64-bit Fedora 24 by Gerd Hoffmann on the hardware:
>>
>>https://www.kraxel.org/blog/2016/04/fedora-on-raspberry-pi-updates/
>>
>> and built a VM image using virt-builder:
>>
>>virt-builder --root-password password:root --arch aarch64 fedora-24
>>
>> I also built the latest UEFI for QEMU from sources:
>>
>>https://wiki.linaro.org/LEG/UEFIforQEMU
>>
>> and updated to QEMU 2.8.0 from rawhide:
>>
>>[root@fedora-rpi2 ~]# qemu-system-aarch64 -version
>>QEMU emulator version 2.8.0(qemu-2.8.0-1.fc26)
>>Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
>>
>> The VM image should be fine because I’m able to boot to it under CPU
>> emulation:
>>
>> qemu-system-aarch64 \
>>-nographic \
>>-M virt \
>>-cpu cortex-a57 \
>>-smp 1 \
>>-m 512 \
>>-bios QEMU_EFI.fd \
>>-device virtio-blk-device,drive=image -drive
>> if=none,id=image,file=fedora-24.img \
>>-netdev bridge,id=hn0,br=virbr0 -device
>> virtio-net-pci,netdev=hn0,romfile= \
>>-device virtio-rng-pci
>>
>> However, when I enable KVM, keyboard stops working (interrupt delivery
>> issue?) and Fedora boot process hangs at random places before reaching
>> login:
>
> [snip]
>
> TL;DR: as it stands now, none of the RPi{2,3} can run KVM out of the
> box, as they lack a virtualization capable interrupt controller. This
> means that timer interrupts cannot be reported to the core, and things
> die a painful death.
>
> The longer story: we have a set of patches somewhere on the list that do
> enable the timer interrupts to be reported to userspace (QEMU), which
> can then inject them into its on GIC emulation and kick the vcpu. So
> far, work on this seems to have stopped (API issues? QEMU patches?)
>
> Christoffer was about to revive the kernel patches, but we need someone
> to pick up the QEMU part, and work with us and the QEMU people so that
> we agree once and for all on the ABI to signal PPIs to userspace.

Interesting timing.  I just revived the patches today, but I reworked
the ABI slightly and rebased the work on top of support for the
physical timer in the guest and ran into two issues: First, turning of
the in-kernel irqchip no longer works, and I also lost track of how
the patches should look, so indeed I need help from a QEMU person to
look at that.  Second, there is some sort of regression on the 32-bit
side using the physical timer patches.

Once I've sorted some of this out, I can send out the patch series.
At least I want to figure out the 32-bit breakage first, and then I
may simply send out the patches with a big fat UNTESTED warning in
hope that someone will work on the qemu side with me.

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH 4/4] hw/arm/virt: Don't incorrectly claim architectural timer to be edge-triggered

2016-12-11 Thread Christoffer Dall

On Fri, Dec 09, 2016 at 04:30:20PM +, Peter Maydell wrote:
> The architectural timers in ARM CPUs all have level triggered interrupts
> (unless you're using KVM on a host kernel before 4.4, which misimplemented
> them as edge-triggered).
> 
> We were incorrectly describing them in the device tree as edge triggered.
> This can cause problems for guest kernels in 4.8 before rc6:
>  * pre-4.8 kernels ignore the values in the DT
>  * 4.8 before rc6 write the DT values to the GIC config registers
>  * newer than rc6 ignore the DT and insist that the timer interrupts
>are level triggered regardless
> 
> Fix the DT so we're describing reality. For backwards-compatibility
> purposes, only do this for the virt-2.9 machine onward.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/arm/virt.c | 34 ++
>  1 file changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 54498ea..2ca9527 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -71,6 +71,7 @@ typedef struct {
>  bool disallow_affinity_adjustment;
>  bool no_its;
>  bool no_pmu;
> +bool claim_edge_triggered_timers;
>  } VirtMachineClass;
>  
>  typedef struct {
> @@ -309,12 +310,31 @@ static void fdt_add_psci_node(const VirtMachineState 
> *vms)
>  
>  static void fdt_add_timer_nodes(const VirtMachineState *vms, int gictype)
>  {
> -/* Note that on A15 h/w these interrupts are level-triggered,
> - * but for the GIC implementation provided by both QEMU and KVM
> - * they are edge-triggered.
> +/* On real hardware these interrupts are level-triggered.
> + * On KVM they were edge-triggered before host kernel version 4.4,
> + * and level-triggered afterwards.
> + * On emulated QEMU they are level-triggered.
> + *
> + * Getting the DTB info about them wrong is awkward for some
> + * guest kernels:
> + *  pre-4.8 ignore the DT and leave the interrupt configured
> + *   with whatever the GIC reset value (or the bootloader) left it at
> + *  4.8 before rc6 honour the incorrect data by programming it back
> + *   into the GIC, causing problems
> + *  4.8rc6 and later ignore the DT and always write "level triggered"
> + *   into the GIC
> + *
> + * For backwards-compatibility, virt-2.8 and earlier will continue
> + * to say these are edge-triggered, but later machines will report
> + * the correct information.
>   */

Is this really necessary?

I don't think the KVM GIC implementation ever listened to the guest in
terms of how to configure PPIs, but instead ignores writes to the config
registers for these interrupts (which I think the GIC architecture
allows).

So this would only be a matter of how the guest kernel between v4.8-rc1
and v4.8-rc6 expects the behavior to be.  Does the arch timer driver
really do something different in how it deals with interrupts based on
this DT value?

Of course, I suppose we could also be running other guests (UEFI?) but
again, if the KVM GIC doesn't care about how the guest tries to program
it, can it make a difference?

>  ARMCPU *armcpu;
> -uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI;
> +VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
> +uint32_t irqflags = GIC_FDT_IRQ_FLAGS_LEVEL_HI;
> +
> +if (vmc->claim_edge_triggered_timers) {
> +irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI;
> +}
>  
>  if (gictype == 2) {
>  irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
> @@ -1556,8 +1576,14 @@ static void virt_2_8_instance_init(Object *obj)
>  
>  static void virt_machine_2_8_options(MachineClass *mc)
>  {
> +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
> +
>  virt_machine_2_9_options(mc);
>  SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_8);
> +/* For 2.8 and earlier we falsely claimed in the DT that
> + * our timers were edge-triggered, not level-triggered.
> + */
> +vmc->claim_edge_triggered_timers = true;
>  }
>  DEFINE_VIRT_MACHINE(2, 8)
>  
I don't understand this virt machine class version stuff.  In which case
is the claim_edge_triggered_timers set to true?  (ok, appears to be when
a 2.8 machine is created, but does that happen automatically or does the
user specifically have to ask for it?)

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH v6 1/4] kernel: Add definitions for GICv3 attributes

2016-11-25 Thread Christoffer Dall

On Fri, Nov 25, 2016 at 02:12:12PM +0530, Vijay Kilari wrote:
> On Fri, Nov 25, 2016 at 1:27 PM, Auger Eric  wrote:
> > Hi Vijay,
> >
> > On 23/11/2016 13:39, vijay.kil...@gmail.com wrote:
> >> From: Vijaya Kumar K 
> >>
> >> This temporary patch adds kernel API definitions. Use proper header update
> >> procedure after these features are released.
> >
> > Did you send the complete v6 series? I only see 1/4 and 4/4 of this v6
> > (https://lists.gnu.org/archive/html/qemu-devel/2016-11/threads.html#04318)?
> > Did I miss something?
> 
> Strange!. Yes, I have sent complete series.
> 
I am seeing the whole thing.

Thanks,
-Christoffer

Re: [Qemu-devel] [RFC 2/2] ARM: KVM: Enable in-kernel timers with user space gic

2016-11-02 Thread Christoffer Dall

On Wed, Nov 02, 2016 at 04:40:35PM +0100, Alexander Graf wrote:
> On 11/01/2016 12:35 PM, Peter Maydell wrote:
> >On 29 October 2016 at 22:10, Alexander Graf  wrote:

[...]

> >
> >>+cpu->timer_irq_level = run->s.regs.timer_irq_level;
> >>+}
> >>+
> >>  return MEMTXATTRS_UNSPECIFIED;
> >>  }
> >Does this code do the right thing across a vcpu reset or
> >a full-system reset?
> 
> Good question. I'm not 100% sure - but I don't know for sure whether
> it's guaranteed without user space irqchip even.
> 
> In essence, the code above merely synchronizes kvm state to qemu
> state and is fully unaffected from any reset sequence. This is good,
> as the line status is transient. So from a QEMU pov, we really only
> copy the state of the vcpu interrupt line into the QEMU interrupt
> line. Pulling that line down would be responsibility of the
> KVM_ARM_VCPU_INIT ioctl if it also clears the timer registers I
> guess.
> 
> However, I don't see any clearing of cntv_ctrl inside KVM or from
> QEMU. How do we ensure that the irq active bit is off on reset?


In kvm_timer_vcpu_reset we cset cntv_ctl = 0, and that function gets
called from the PSCI handler or whenever userspace calls the set target
ioctl thingy.

> 
> The other part that could get in the way of working system reset is
> the interrupt controller emulation itself which resets all internal
> irq line state. So on reset we'd always end up with the irq line
> down from a gic pov, but with the vtimer line pending or not pending
> depending on previous state. I doubt it's really going to hurt
> though.

I suppose it should resample the line, but if the GIC clears everything
and the arch timer line goes down, you're in the right starting state
again.  Right?

-Christoffer

Re: [Qemu-devel] [RFC PATCH v3 0/5] vITS support

2016-06-28 Thread Christoffer Dall

On Tue, Jun 28, 2016 at 8:41 AM, Auger Eric  wrote:
> Dear all,
>
> On 24/11/2015 11:13, Pavel Fedin wrote:
>> This series introduces support for in-kernel GICv3 ITS emulation.
>> It is based on kernel API which is not released yet, therefore i post
>> it as an RFC.
>>
>> Kernel patch sets which implement this functionality are:
>> - [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation
>>   http://www.spinics.net/lists/kvm/msg121878.html
>> - [PATCH v3 0/7] KVM: arm/arm64: gsi routing support
>>   http://www.spinics.net/lists/kvm/msg119567.html
>
> For testing KVM/ARM gsi routing and KVM ARM PCIe/MSI passthrough with
> GICv3 ITS I intend to work on the respin of this series. I now have
> access to a host featuring GICv3 ITS. Does anyone currently work on this
> topic or intend to do so in short term or can I proceed?
>
Linaro is not working on this feature yet, and we would be happy to
see you pick up this work.

Thanks,
-Christoffer

Re: [Qemu-devel] [kvm-unit-tests PATCH 00/10] arm/arm64: add gic framework

2016-05-24 Thread Christoffer Dall

On Tue, May 24, 2016 at 02:23:43PM +0200, Andrew Jones wrote:
> On Tue, May 24, 2016 at 01:58:19PM +0200, Christoffer Dall wrote:
> > On Mon, May 23, 2016 at 05:24:23PM +0200, Andrew Jones wrote:
> > > On Wed, May 18, 2016 at 11:07:14AM +0200, Christoffer Dall wrote:
> > > > Hi Drew,
> > > > 
> > > > Thanks for doing this.  I'm happy to see some tests for the GIC.
> > > > 
> > > > I've been pondering with how to write unit tests for all the MMIO
> > > > implementations.  If you have some thoughts on how that could be easily
> > > > fitted into this framework, that would probably be a good place to do it
> > > > ;)
> > > 
> > > Hi Christoffer,
> > > 
> > > Sorry for my slow response, I've been on vacation. For MMIO
> > > implementations, are you referring to the emulation done for
> > > gicv2 accesses and for gicv3 legacy accesses? And, if so, is
> > > your question how we might be able to use the same test
> > > framework for both? And, if that's so, then I think this series
> > > gets us pretty close already. If I'm completely off-base, then
> > > please give me a quick high-level description of what you'd like
> > > to be able to do.
> > > 
> > What I meant was testing all the MMIO accesses to the various
> > distributor MMIO regions.
> > 
> > For example, writing full words to all registers (some value) reading
> > back the value, correcting for RAZ/WI semantics, and testing that byte
> > accesses to those registers where that's allowed also works.
> 
> OK, understood. We can build a table that describes each distributor
> offset's allowed access types and expected read-back results for the
> "default enablement" of the gic. Then, we'd run through that table
> doing a refresh of the gic enabling before each offset test. This
> series provides everything needed for that, except the offset table.
> It should be pretty easy to add.
> 
> Now, configuring the gic differently will result in some offsets
> producing different values, so we'll eventually want to extend the
> table to check the same offsets using different gic enable functions
> as well, but that would be pretty easy to do too.
> 
> > 
> > If adding that on top of this series sounds like a good idea, someone
> > should add it to the bottom of their (presumably already long) todo
> > list, myself included.
> 
> They do sound like good tests to have. I've added it to the middle
> of my long TODO. If somebody beats me to it, I won't complain :-)
> 
Awesome, thanks!

-Christoffer

Re: [Qemu-devel] [kvm-unit-tests PATCH 00/10] arm/arm64: add gic framework

2016-05-24 Thread Christoffer Dall

On Mon, May 23, 2016 at 05:24:23PM +0200, Andrew Jones wrote:
> On Wed, May 18, 2016 at 11:07:14AM +0200, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > Thanks for doing this.  I'm happy to see some tests for the GIC.
> > 
> > I've been pondering with how to write unit tests for all the MMIO
> > implementations.  If you have some thoughts on how that could be easily
> > fitted into this framework, that would probably be a good place to do it
> > ;)
> 
> Hi Christoffer,
> 
> Sorry for my slow response, I've been on vacation. For MMIO
> implementations, are you referring to the emulation done for
> gicv2 accesses and for gicv3 legacy accesses? And, if so, is
> your question how we might be able to use the same test
> framework for both? And, if that's so, then I think this series
> gets us pretty close already. If I'm completely off-base, then
> please give me a quick high-level description of what you'd like
> to be able to do.
> 
What I meant was testing all the MMIO accesses to the various
distributor MMIO regions.

For example, writing full words to all registers (some value) reading
back the value, correcting for RAZ/WI semantics, and testing that byte
accesses to those registers where that's allowed also works.

If adding that on top of this series sounds like a good idea, someone
should add it to the bottom of their (presumably already long) todo
list, myself included.

Thanks,
-Christoffer

Re: [Qemu-devel] [kvm-unit-tests PATCH 00/10] arm/arm64: add gic framework

2016-05-18 Thread Christoffer Dall

Hi Drew,

Thanks for doing this.  I'm happy to see some tests for the GIC.

I've been pondering with how to write unit tests for all the MMIO
implementations.  If you have some thoughts on how that could be easily
fitted into this framework, that would probably be a good place to do it
;)

-Christoffer

On Mon, May 16, 2016 at 09:57:14AM +0200, Andrew Jones wrote:
> Import defines, and steal enough helper functions, from Linux to
> enable programming of the gic (v2 and v3). Then use the framework
> to add an initial test (an ipi test; self, target-list, broadcast).
> 
> It's my hope that this framework will be a suitable base on which
> more tests may be easily added, particularly because we have
> vgic-new and tcg gicv3 emulation getting close to merge.
> 
> To run it, along with other tests, just do
> 
>  ./configure [ --arch=[arm|arm64] --cross-prefix=$PREFIX ]
>  make
>  export QEMU=$PATH_TO_QEMU
>  ./run_tests.sh
> 
> To run it separately do, e.g.
> 
> $QEMU -machine virt,accel=tcg -cpu cortex-a57 \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=ctd -chardev testdev,id=ctd \
>  -display none -serial stdio \
>  -kernel arm/gic.flat \
>  -smp 123 -machine gic-version=3 -append ipi
> 
>   ^^ note, we can go nuts with nr-cpus on TCG :-)
> 
> Or, a KVM example using a different "sender" cpu and irq (other than zero)
> 
> $QEMU -machine virt,accel=kvm -cpu host \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=ctd -chardev testdev,id=ctd \
>  -display none -serial stdio \
>  -kernel arm/gic.flat \
>  -smp 48 -machine gic-version=3 -append 'ipi sender=42 irq=1'
> 
> 
> Patches:
> 01-05: fixes and functionality needed by the later gic patches
> 06-07: code theft from Linux (defines, helper functions)
> 08-10: arm/gic.flat (the base of the gic unit test), currently just IPI
> 
> Available here: https://github.com/rhdrjones/kvm-unit-tests/commits/arm/gic
> 
> 
> Andrew Jones (10):
>   lib: xstr: allow multiple args
>   arm64: fix get_"sysreg32" and make MPIDR 64bit
>   arm/arm64: smp: support more than 8 cpus
>   arm/arm64: add some delay routines
>   arm/arm64: irq enable/disable
>   arm/arm64: add initial gicv2 support
>   arm64: add initial gicv3 support
>   arm/arm64: gicv2: add an IPI test
>   arm/arm64: gicv3: add an IPI test
>   arm/arm64: gic: don't just use zero
> 
>  arm/Makefile.common|   7 +-
>  arm/gic.c  | 338 
> +
>  arm/run|  19 ++-
>  arm/selftest.c |   5 +-
>  arm/unittests.cfg  |  13 ++
>  lib/arm/asm/arch_gicv3.h   | 184 
>  lib/arm/asm/gic-v2.h   |  74 ++
>  lib/arm/asm/gic-v3.h   | 320 ++
>  lib/arm/asm/gic.h  |  21 +++
>  lib/arm/asm/processor.h|  38 -
>  lib/arm/asm/setup.h|   4 +-
>  lib/arm/gic.c  | 127 +
>  lib/arm/processor.c|  15 ++
>  lib/arm/setup.c|  12 +-
>  lib/arm64/asm/arch_gicv3.h | 169 +++
>  lib/arm64/asm/gic-v2.h |   1 +
>  lib/arm64/asm/gic-v3.h |   1 +
>  lib/arm64/asm/gic.h|   1 +
>  lib/arm64/asm/processor.h  |  53 +--
>  lib/arm64/asm/sysreg.h |  44 ++
>  lib/arm64/processor.c  |  15 ++
>  lib/libcflat.h |   4 +-
>  22 files changed, 1439 insertions(+), 26 deletions(-)
>  create mode 100644 arm/gic.c
>  create mode 100644 lib/arm/asm/arch_gicv3.h
>  create mode 100644 lib/arm/asm/gic-v2.h
>  create mode 100644 lib/arm/asm/gic-v3.h
>  create mode 100644 lib/arm/asm/gic.h
>  create mode 100644 lib/arm/gic.c
>  create mode 100644 lib/arm64/asm/arch_gicv3.h
>  create mode 100644 lib/arm64/asm/gic-v2.h
>  create mode 100644 lib/arm64/asm/gic-v3.h
>  create mode 100644 lib/arm64/asm/gic.h
>  create mode 100644 lib/arm64/asm/sysreg.h
> 
> -- 
> 2.4.11
>

[Qemu-devel] [PATCH v2] util: align memory allocations to 2M on AArch64

2016-04-22 Thread Christoffer Dall

For KVM to use Transparent Huge Pages (THP) we have to ensure that the
alignment of the userspace address of the KVM memory slot and the IPA
that the guest sees for a memory region have the same offset from the 2M
huge page size boundary.

One way to achieve this is to always align the IPA region at a 2M
boundary and ensure that the mmap alignment is also at 2M.

Unfortunately, we were only doing this for __arm__, not for __aarch64__,
so add this simply condition.

This fixes a performance regression using KVM/ARM on AArch64 platforms
that showed a performance penalty of more than 50%, introduced by the
following commit:

9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10)

We were only lucky before the above commit, because we were allocating
large regions and naturally getting a 2M alignment on those allocations
then.

Reported-by: Shih-Wei Li <shih...@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
---
The first version of this patch was accidentally made against the v2.5.0
release instead of master, so this is a rebased version.

 util/oslib-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 20ca141..a0c5b91 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -26,7 +26,7 @@
  * THE SOFTWARE.
  */
 
-#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__))
+#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__) || 
defined(__aarch64__))
/* Use 2 MiB alignment so transparent hugepages can be used by KVM.
   Valgrind does not support alignments larger than 1 MiB,
   therefore we need special code which handles running on Valgrind. */
-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH] util: align memory allocations to 2M on AArch64

2016-04-22 Thread Christoffer Dall

For KVM to use Transparent Huge Pages (THP) we have to ensure that the
alignment of the userspace address of the KVM memory slot and the IPA
that the guest sees for a memory region have the same offset from the 2M
huge page size boundary.

One way to achieve this is to always align the IPA region at a 2M
boundary and ensure that the mmap alignment is also at 2M.

Unfortunately, we were only doing this for __arm__, not for __aarch64__,
so add this simply condition.

This fixes a performance regression using KVM/ARM on AArch64 platforms
that showed a performance penalty of more than 50%, introduced by the
following commit:

9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10)

We were only lucky before the above commit, because we were allocating
large regions and naturally getting a 2M alignment on those allocations
then.

Reported-by: Shih-Wei Li <shih...@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
---
 util/oslib-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index d25f671..03b055e 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -35,7 +35,7 @@
 extern int daemon(int, int);
 #endif
 
-#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__))
+#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__) || 
defined(__aarch64__))
/* Use 2 MiB alignment so transparent hugepages can be used by KVM.
   Valgrind does not support alignments larger than 1 MiB,
   therefore we need special code which handles running on Valgrind. */
-- 
2.1.2.330.g565301e.dirty

Re: [Qemu-devel] Performance regression using KVM/ARM

2016-04-22 Thread Christoffer Dall

On Fri, Apr 22, 2016 at 11:17:47AM +0100, Peter Maydell wrote:
> On 22 April 2016 at 11:15, Christoffer Dall <christoffer.d...@linaro.org> 
> wrote:
> > Peter just pointed me to a change I remember doing for ARM, so perhaps
> > this fix is the right one?
> >
> >
> > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > index d25f671..a36e734 100644
> > --- a/util/oslib-posix.c
> > +++ b/util/oslib-posix.c
> > @@ -35,7 +35,7 @@
> >  extern int daemon(int, int);
> >  #endif
> >
> > -#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__))
> > +#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) || 
> > defined(__aarch64__)
> > /* Use 2 MiB alignment so transparent hugepages can be used by KVM.
> >Valgrind does not support alignments larger than 1 MiB,
> >therefore we need special code which handles running on Valgrind. */
> 
> I hadn't realised AArch64 didn't define __arm__. Your extra clause
> wants to be inside the parens for the ||, not outside.
> 
> So was the problem just that we weren't passing 2MB as the align
> parameter to qemu_ram_mmap(), and if we do pass 2MB then it does the
> right thing ?
> 
Yes, that was essentially the problem.  I'll send a proper patch.

However, Marc pointed out, that we should probably think about improving
on this in the future, because if you're running on a 64K page system
and want huge pages, you have to align at a higher boundary, but I'm on
the other hand not sure such a boundary is always practical on a 64K
system.  Which would suggest that we either need to:

1) Probe the kernel for the page size and always align to something that
allows huge pages, or

2) Let the user specify an option that says it wants to be able to use
THP and only then align to the huge page boundary.

Not sure...

-Christoffer

Re: [Qemu-devel] Performance regression using KVM/ARM

2016-04-22 Thread Christoffer Dall

On Fri, Apr 22, 2016 at 12:06:52PM +0200, Alexander Graf wrote:
> On 04/22/2016 12:01 PM, Christoffer Dall wrote:
> >On Thu, Apr 21, 2016 at 09:50:05PM +0200, Alexander Graf wrote:
> >>
> >>On 21.04.16 18:23, Christoffer Dall wrote:
> >>>Hi,
> >>>
> >>>Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM,
> >>>2015-09-10) had the unfortunate side effect that memory slots registered
> >>>with KVM no longer contain a userspace address that is aligned to a 2M
> >>>boundary, causing the use of THP to fail in the kernel.
> >>>
> >>>I fail to see where in the QEMU code we should be asking for a 2M
> >>>alignment of our memory region.  Can someone help pointing me to the
> >>>right place to fix this or suggest a patch?
> >>>
> >>>This causes a performance regssion of hackbench on KVM/ARM of about 62%
> >>>compared to the workload running with THP.
> >>>
> >>>We have verified that this is indeed the cause of the failure by adding
> >>>various prints to QEMU and the kernel, but unfortunatley my QEMU
> >>>knowledge is not sufficient for me to fix it myself.
> >>>
> >>>Any help would be much appreciated!
> >>The code changed quite heavily since I last looked at it, but could you
> >>please try whether the (untested) patch below makes a difference?
> >>
> >>
> >Unfortunately this doesn't make any difference.  It feels to me like
> >we're missing specifying a 2M alignemnt in QEMU somewhere, but I can't
> >properly understand the links between the actual allocation, registering
> >mem slots with the KVM part of QEMU, and actually setting up KVM user
> >memory regions.
> >
> >What has to happen is that the resulting struct
> >kvm_userspace_memory_region() has the same alignment offset from 2M (the
> >huge page size) of the ->guest_phys_addr and ->userspace-addr fields.
> 
> Well, I would expect that the guest address space is always very big
> aligned - and definitely at least 2MB - so we're safe there.
> 
> That means we only need to align the qemu virtual address. There
> used to be a memalign() call for that, but it got replaced with
> direct mmap() and then a lot of code changed on top. Looking at the
> logs, I'm sure Paolo knows the answer though :)
> 
Peter just pointed me to a change I remember doing for ARM, so perhaps
this fix is the right one?


diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index d25f671..a36e734 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -35,7 +35,7 @@
 extern int daemon(int, int);
 #endif
 
-#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__))
+#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) || 
defined(__aarch64__)
/* Use 2 MiB alignment so transparent hugepages can be used by KVM.
   Valgrind does not support alignments larger than 1 MiB,
   therefore we need special code which handles running on Valgrind. */


Thanks,
-Christoffer

Re: [Qemu-devel] Performance regression using KVM/ARM

2016-04-22 Thread Christoffer Dall

Hi Laszlo,

On Thu, Apr 21, 2016 at 11:58:07PM +0200, Laszlo Ersek wrote:
> On 04/21/16 18:23, Christoffer Dall wrote:
> > Hi,
> > 
> > Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM,
> > 2015-09-10) had the unfortunate side effect that memory slots registered
> > with KVM no longer contain a userspace address that is aligned to a 2M
> > boundary, causing the use of THP to fail in the kernel.
> > 
> > I fail to see where in the QEMU code we should be asking for a 2M
> > alignment of our memory region.  Can someone help pointing me to the
> > right place to fix this or suggest a patch?
> > 
> > This causes a performance regssion of hackbench on KVM/ARM of about 62%
> > compared to the workload running with THP.
> > 
> > We have verified that this is indeed the cause of the failure by adding
> > various prints to QEMU and the kernel, but unfortunatley my QEMU
> > knowledge is not sufficient for me to fix it myself.
> > 
> > Any help would be much appreciated!
> 
> Can you please test the attached series?
> 
> (Note that I'm only interested in solving this problem as a productive
> distraction, so if the patches don't work, or require a lot of massaging
> for merging, I'll just drop them (or, preferably, give them to someone
> else).)
> 

I like your procrastination methods!

Unfortunately this fix wasn't the right one either.

-Christoffer

Re: [Qemu-devel] Performance regression using KVM/ARM

2016-04-22 Thread Christoffer Dall

On Thu, Apr 21, 2016 at 09:50:05PM +0200, Alexander Graf wrote:
> 
> 
> On 21.04.16 18:23, Christoffer Dall wrote:
> > Hi,
> > 
> > Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM,
> > 2015-09-10) had the unfortunate side effect that memory slots registered
> > with KVM no longer contain a userspace address that is aligned to a 2M
> > boundary, causing the use of THP to fail in the kernel.
> > 
> > I fail to see where in the QEMU code we should be asking for a 2M
> > alignment of our memory region.  Can someone help pointing me to the
> > right place to fix this or suggest a patch?
> > 
> > This causes a performance regssion of hackbench on KVM/ARM of about 62%
> > compared to the workload running with THP.
> > 
> > We have verified that this is indeed the cause of the failure by adding
> > various prints to QEMU and the kernel, but unfortunatley my QEMU
> > knowledge is not sufficient for me to fix it myself.
> > 
> > Any help would be much appreciated!
> 
> The code changed quite heavily since I last looked at it, but could you
> please try whether the (untested) patch below makes a difference?
> 
> 
Unfortunately this doesn't make any difference.  It feels to me like
we're missing specifying a 2M alignemnt in QEMU somewhere, but I can't
properly understand the links between the actual allocation, registering
mem slots with the KVM part of QEMU, and actually setting up KVM user
memory regions.

What has to happen is that the resulting struct
kvm_userspace_memory_region() has the same alignment offset from 2M (the
huge page size) of the ->guest_phys_addr and ->userspace-addr fields.

Thanks,
-Christoffer

[Qemu-devel] Performance regression using KVM/ARM

2016-04-21 Thread Christoffer Dall

Hi,

Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM,
2015-09-10) had the unfortunate side effect that memory slots registered
with KVM no longer contain a userspace address that is aligned to a 2M
boundary, causing the use of THP to fail in the kernel.

I fail to see where in the QEMU code we should be asking for a 2M
alignment of our memory region.  Can someone help pointing me to the
right place to fix this or suggest a patch?

This causes a performance regssion of hackbench on KVM/ARM of about 62%
compared to the workload running with THP.

We have verified that this is indeed the cause of the failure by adding
various prints to QEMU and the kernel, but unfortunatley my QEMU
knowledge is not sufficient for me to fix it myself.

Any help would be much appreciated!

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH v6 0/4] ARM: add query-gic-capabilities SMP command

2016-03-23 Thread Christoffer Dall

Hi Peter,

Stupid question: The subject says "SMP command".  Did you mean "QMP
command" ?

Thanks,
-Christoffer

On Wed, Mar 23, 2016 at 01:32:29PM +0800, Peter Xu wrote:
> v6 changes:
> - patch 1 (squashed into patch 2)
>   - explain more about the following in commit message: why we need
> this command, and what does the entries mean [Markus]
>   - explain what GIC is [Eric]
>   - squash this patch into patch 2 [Eric]
> - patch 2 (new patch 1)
>   - fix "does not implement" english error [Eric]
>   - remove useless headers in target-arm/monitor.c [Markus]
>   - remove this command from whitelist [Markus, Eric]
>   - return dict rather than array, with single key points to the
> original array results [Markus, Eric]
> - patch 5 (new patch 4)
>   - tiny change in implementation to suite the new interface.
> 
> v5 changes:
> - patch 2: moved to target-arm/monitor.c (from target-arm/machine.c)
>[Peter]
> - patch 3: splitted into three patches: [all from Peter's comments]
>   - patch 3 (new): leverage kvm_arm_create_scratch_host_vcpu(), tiny
> enhancement of old one to suite our need
>   - patch 4: introduce kvm_support_device() in kvm-all.c
>   - patch 5: do the implementation.
> 
> v4 changes:
> - all: rename query-gic-capability to query-gic-capabilities [Andrea]
> - patch 3: rename helper function to kvm_support_device, make it
>   inline and lighter. [Drew]
> 
> v3 changes:
> - patch 2: remove func declaration, add qmp header [Drew]
> - patch 3: being able to detect KVM GIC capabilities even without
>   kvm enabled [Andrea]: this is a little bit hacky, need some more
>   review on this.
> 
> v2 changes:
> - result layout change: use array and dict for the capability bits
>   rather than a single array of strings [Andrea/Markus]
> - spelling out what GIC is in doc [Eric]
> 
> This patch is to add ARM-specific command "query-gic-capability".
> 
> The new command can report which kind of GIC device the host/QEMU
> support. The returned result is in the form of array.
> 
> Sample command and output:
> 
> {"execute": "query-gic-capability"}
> {"return": [{"emulated": false, "version": 3, "kernel": false},
> {"emulated": true, "version": 2, "kernel": true}]}
> 
> Testing:
> 
> Smoke tests on both x86 (emulated) and another moonshot ARM server.
> 
> Peter Xu (4):
>   arm: qmp: add query-gic-capabilities interface
>   arm: enhance kvm_arm_create_scratch_host_vcpu
>   kvm: add kvm_support_device() helper function
>   arm: implement query-gic-capabilities
> 
>  include/sysemu/kvm.h |  9 +
>  kvm-all.c| 15 +
>  monitor.c|  8 +
>  qapi-schema.json | 50 
>  qmp-commands.hx  | 28 
>  target-arm/Makefile.objs |  2 +-
>  target-arm/kvm.c | 10 +-
>  target-arm/kvm_arm.h |  6 ++--
>  target-arm/monitor.c | 86 
> 
>  9 files changed, 210 insertions(+), 4 deletions(-)
>  create mode 100644 target-arm/monitor.c
> 
> -- 
> 2.4.3
> 
>

Re: [Qemu-devel] [PATCH] virt: Lift the maximum RAM limit from 30GB to 255GB

2016-02-26 Thread Christoffer Dall

On Thu, Feb 25, 2016 at 04:51:51PM +, Peter Maydell wrote:
> [Typoed the kvmarm list address; sorry... -- PMM]
> 
> On 25 February 2016 at 12:09, Peter Maydell <peter.mayd...@linaro.org> wrote:
> > The virt board restricts guests to only 30GB of RAM. This is a
> > hangover from the vexpress-a15 board, and there's inherent reason

did you mean "there's *no* inherent reason" ?

> > for it. 30GB is smaller than you might reasonably want to provision
> > a VM for on a beefy server machine. Raise the limit to 255GB.
> >
> > We choose 255GB because the available space we currently have
> > below the 1TB boundary is up to the 512GB mark, but we don't
> > want to paint ourselves into a corner by assigning it all to
> > RAM. So we make half of it available for RAM, with the 256GB..512GB
> > range available for future non-RAM expansion purposes.
> >
> > If we need to provide more RAM to VMs in the future then we need to:
> >  * allocate a second bank of RAM starting at 2TB and working up
> >  * fix the DT and ACPI table generation code in QEMU to correctly
> >report two split lumps of RAM to the guest
> >  * fix KVM in the host kernel to allow guests with >40 bit address spaces
> >
> > The last of these is obviously the trickiest, but it seems
> > reasonable to assume that anybody configuring a VM with a quarter
> > of a terabyte of RAM will be doing it on a host with more than a
> > terabyte of physical address space.
> >
> > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
> > ---
> > CC'ing kvm-arm as a heads-up that my proposal here is to make
> > the kernel devs do the heavy lifting for supporting >255GB.
> > Discussion welcome on whether I have the tradeoffs here right.

I think so, this looks good to me.

> > ---
> >  hw/arm/virt.c | 21 +++--
> >  1 file changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 44bbbea..7a56b46 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -95,6 +95,23 @@ typedef struct {
> >  #define VIRT_MACHINE_CLASS(klass) \
> >  OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE)
> >
> > +/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
> > + * RAM can go up to the 256GB mark, leaving 256GB of the physical
> > + * address space unallocated and free for future use between 256G and 512G.
> > + * If we need to provide more RAM to VMs in the future then we need to:
> > + *  * allocate a second bank of RAM starting at 2TB and working up
> > + *  * fix the DT and ACPI table generation code in QEMU to correctly
> > + *report two split lumps of RAM to the guest
> > + *  * fix KVM in the host kernel to allow guests with >40 bit address 
> > spaces
> > + * (We don't want to fill all the way up to 512GB with RAM because
> > + * we might want it for non-RAM purposes later. Conversely it seems
> > + * reasonable to assume that anybody configuring a VM with a quarter
> > + * of a terabyte of RAM will be doing it on a host with more than a
> > + * terabyte of physical address space.)
> > + */
> > +#define RAMLIMIT_GB 255
> > +#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
> > +
> >  /* Addresses and sizes of our components.
> >   * 0..128MB is space for a flash device so we can run bootrom code such as 
> > UEFI.
> >   * 128MB..256MB is used for miscellaneous device I/O.
> > @@ -130,7 +147,7 @@ static const MemMapEntry a15memmap[] = {
> >  [VIRT_PCIE_MMIO] =  { 0x1000, 0x2eff },
> >  [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
> >  [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
> > -[VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 },
> > +[VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES },
> >  /* Second PCIe window, 512GB wide at the 512GB boundary */
> >  [VIRT_PCIE_MMIO_HIGH] =   { 0x800000ULL, 0x80ULL },
> >  };
> > @@ -1066,7 +1083,7 @@ static void machvirt_init(MachineState *machine)
> >  vbi->smp_cpus = smp_cpus;
> >
> >  if (machine->ram_size > vbi->memmap[VIRT_MEM].size) {
> > -error_report("mach-virt: cannot model more than 30GB RAM");
> > +error_report("mach-virt: cannot model more than %dGB RAM", 
> > RAMLIMIT_GB);
> >  exit(1);
> >  }
> >
> > --
> > 1.9.1

Reviewed-by: Christoffer Dall <christoffer.d...@linaro.org>

Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Christoffer Dall

On Tue, Feb 2, 2016 at 4:52 PM, Eric Blake <ebl...@redhat.com> wrote:
> On 02/02/2016 07:05 AM, Christoffer Dall wrote:
>
>>>
>>> I'm not familiar enough with libvirt, nor the use of QMP, to really argue
>>> one way or another, but I find it a bit strange that we'd prefer libvirt
>>> to query two entities over one. And, why should the libvirt installed on
>>> a particular host prefer gicv3 as the default, just because KVM supports
>>> it, even when QEMU does not?
>>
>> I think the assumption here is that if you install a recent libvirt you
>> also install a recent QEMU.  You always have the risk of things not
>> working if you have too old a QEMU, right?
>
> Libvirt exists for providing back-compat glue.  The following
> combinations are supported:
>
> old libvirt, old qemu
> new libvirt, old qemu
> new libvirt, new qemu
>
> and it is only this combination that might require a libvirt upgrade to
> work correctly:
>
> old libvirt, new qemu
>

ok, so that would mean we need to implement a QMP command to tell us
which gic versions are supported for a given machine.  Current
possible responses are "2", "3" and "2,3"

and we also need to add code to libvirt to try that QMP command, and
if it doesn't exist, fall back to not specifying gic-version, using
the old-qemu compatible default of providing a gicv2 to guests, and if
the QMP command exists, use the newest gic-version.

users can then always override this behavior by directly specifying a
gic version "host", "2", or "3" in their xml file.

any objections?

-Christoffer

Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Christoffer Dall

On Fri, Jan 22, 2016 at 02:44:32PM +, Daniel P. Berrange wrote:
> On Wed, Jan 06, 2016 at 01:30:16PM +, Peter Maydell wrote:
> > On 6 January 2016 at 12:49, Andrea Bolognani  wrote:
> > > That's correct, having a QMP command that lists the values gic-version
> > > can have on the current host would be just great.
> > >
> > > If we had that, we could validate the GIC version chosen for a guest,
> > > and expose it in the capabilities XML so that higher-level tools can
> > > provide a list of choices to the user.
> > >
> > > Please note that this QMP command would have to work regardless of the
> > > machine type selected on QEMU's command line, because libvirt always
> > > runs a QEMU binary with '-M none' when probing its capabilities.
> > 
> > On the other hand, if you don't tell us the machine type you care
> > about then we can't tell you:
> >  (a) "this machine type doesn't support setting this property at all"
> >  (which applies to machines like vexpress-a15 which you can use with
> >  KVM on 32-bit hosts, and of course also to all the non-KVM models)
> 
> We have just recently merged support for registering properties against
> classes instead of object instances. There is also a proposed command
> to allow querying the list of properties against a class
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04348.html
> 
> So if we now update the machine types to register their properties against
> the class instead of object, then we can query what properties are present
> against each machine type, while still using '-M none'.
> 
> >  (b) "this machine type only supports GIC versions X and Y even if the
> >   host supports more" (this is currently only hypothetical, though,
> >   since we only have the property on 'virt'. it would only happen
> >   if in the future we needed something other than '2' or '3' or
> >   'host' I think.)
> 
> Our introspection support in QOM only allows us to say that a property
> is a particular type (int / enum / str / whatever). We don't have any
> way to expose info about what subset of possible values for a type are
> permitted. So I don't see any near term way to inform apps that the
> gic property accepts values x, y and but not z.
> 
> IMHO, we shouldn't try to overthink this. Libvirt can query the host
> to find out what GIC versions are supported and default to using the
> most recent version, on the basis that people are likely to have a
> matching QEMU. We can just rely on QEMU to report error if we pass
> it a version it doesn't support and not try to pre-emptively check
> before launch. The key is just getting the default right IMHO.
> 
This sounds fine to me.

However, not being familiar with the internals of libvirt I really can't
just what the right implementation approach here is.

As I understand we need to either:

  1) Add a QMP command that lets you ask for -M virt, if GIC version X
 is supported

  2) Just implement something in libvirt that checks what the kernel
 supports directly via the well-defined KVM interface and chooses
 the highest supported version per default.

To me it sounds like we should just go ahead with (2) and document
somehwere that if you get an error, you need to either manually change
the gic version setting in your VM definition file or upgrade QEMU.

Can someone with libvirt authority please advice whether (1) or (2) is
what we need to do?

Thanks,
-Christoffer

Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Christoffer Dall

On Tue, Feb 02, 2016 at 01:59:26PM +0100, Andrew Jones wrote:
> On Tue, Feb 02, 2016 at 12:10:10PM +, Daniel P. Berrange wrote:
> > On Tue, Feb 02, 2016 at 12:49:33PM +0100, Christoffer Dall wrote:
> > > On Fri, Jan 22, 2016 at 02:44:32PM +, Daniel P. Berrange wrote:
> > > > On Wed, Jan 06, 2016 at 01:30:16PM +, Peter Maydell wrote:
> > > > > On 6 January 2016 at 12:49, Andrea Bolognani <abolo...@redhat.com> 
> > > > > wrote:
> > > > > > That's correct, having a QMP command that lists the values 
> > > > > > gic-version
> > > > > > can have on the current host would be just great.
> > > > > >
> > > > > > If we had that, we could validate the GIC version chosen for a 
> > > > > > guest,
> > > > > > and expose it in the capabilities XML so that higher-level tools can
> > > > > > provide a list of choices to the user.
> > > > > >
> > > > > > Please note that this QMP command would have to work regardless of 
> > > > > > the
> > > > > > machine type selected on QEMU's command line, because libvirt always
> > > > > > runs a QEMU binary with '-M none' when probing its capabilities.
> > > > > 
> > > > > On the other hand, if you don't tell us the machine type you care
> > > > > about then we can't tell you:
> > > > >  (a) "this machine type doesn't support setting this property at all"
> > > > >  (which applies to machines like vexpress-a15 which you can use 
> > > > > with
> > > > >  KVM on 32-bit hosts, and of course also to all the non-KVM 
> > > > > models)
> > > > 
> > > > We have just recently merged support for registering properties against
> > > > classes instead of object instances. There is also a proposed command
> > > > to allow querying the list of properties against a class
> > > > 
> > > >   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04348.html
> > > > 
> > > > So if we now update the machine types to register their properties 
> > > > against
> > > > the class instead of object, then we can query what properties are 
> > > > present
> > > > against each machine type, while still using '-M none'.
> > > > 
> > > > >  (b) "this machine type only supports GIC versions X and Y even if the
> > > > >   host supports more" (this is currently only hypothetical, 
> > > > > though,
> > > > >   since we only have the property on 'virt'. it would only happen
> > > > >   if in the future we needed something other than '2' or '3' or
> > > > >   'host' I think.)
> > > > 
> > > > Our introspection support in QOM only allows us to say that a property
> > > > is a particular type (int / enum / str / whatever). We don't have any
> > > > way to expose info about what subset of possible values for a type are
> > > > permitted. So I don't see any near term way to inform apps that the
> > > > gic property accepts values x, y and but not z.
> 
> This actually doesn't matter for the v2 vs. v3 case. The gic-version property
> doesn't exist at all for v2-only QEMU. Although maybe the gic-version property
> should be reworked. Instead of gic-version=, we could 
> create
> one boolean property per version supported, i.e. gicv2, gicv3, gicv4...
> 
> > > > 
> > > > IMHO, we shouldn't try to overthink this. Libvirt can query the host
> > > > to find out what GIC versions are supported and default to using the
> > > > most recent version, on the basis that people are likely to have a
> > > > matching QEMU. We can just rely on QEMU to report error if we pass
> > > > it a version it doesn't support and not try to pre-emptively check
> > > > before launch. The key is just getting the default right IMHO.
> > > > 
> > > This sounds fine to me.
> > > 
> > > However, not being familiar with the internals of libvirt I really can't
> > > just what the right implementation approach here is.
> > > 
> > > As I understand we need to either:
> > > 
> > >   1) Add a QMP command that lets you ask for -M virt, if GIC version X
> > >  is supported
> > > 
> > >   2) Just implement something in libvirt that checks what the kernel
> >

[Qemu-devel] [PATCH] hw/arm/virt: Add always-on property to the virt board timer

2016-01-19 Thread Christoffer Dall

The virt board has an arch timer, which is always on.  Emit the
"always-on" property to indicate to Linux that it can switch off the
periodic timer and reduces the amount of interrupts injected into a
guest.

Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
---
 hw/arm/virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 05f9087..265fe9a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, 
int gictype)
 qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",
 "arm,armv7-timer");
 }
+qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);
 qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,
-- 
2.1.2.330.g565301e.dirty

Re: [Qemu-devel] [PATCH] hw/arm/virt: Add always-on property to the virt board timer

2016-01-19 Thread Christoffer Dall

On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:
> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:
> > The virt board has an arch timer, which is always on.  Emit the
> > "always-on" property to indicate to Linux that it can switch off the
> > periodic timer and reduces the amount of interrupts injected into a
> > guest.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
> > ---
> >  hw/arm/virt.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 05f9087..265fe9a 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo 
> > *vbi, int gictype)
> >  qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",
> >  "arm,armv7-timer");
> >  }
> > +qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);
> >  qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
> > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, 
> > irqflags,
> > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, 
> > irqflags,
> > -- 
> > 2.1.2.330.g565301e.dirty
> > 
> >
> 
> Hi Christoffer,
> 
> We should also patch the ACPI generation at the same time. I think
> something like
> 
>  - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;
>  + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;

I'm really not familiar enough with ACPI to be comfortable writing code
for this or testing this.

But if someone can pick this up and add the ACPI bits or can post a
follow-up patch, then I'm all for it :)

> 
> should do it.
> 
> Also, having the guest reduce the number of interrupts sounds good. Can
> you point me to something to read about how/why a guest may choose to do
> that, and what the trade-offs are?
> 
Not really, but you can ask Marc.

-Christoffer

Re: [Qemu-devel] [RESEND RFC 0/6] AMD XGBE KVM platform passthrough

2015-11-25 Thread Christoffer Dall

On Thu, Nov 19, 2015 at 11:44 PM, Alex Williamson
 wrote:
> On Thu, 2015-11-19 at 15:22 +, Eric Auger wrote:
>> I am resending this RFC from Oct 12, after kernel 4.4-rc1 and
>> QEMU 2.5-rc1, hoping things have calmed down a little bit.
>>
>> This RFC allows to set up AMD XGBE passthrough. This was tested on AMD
>> Seattle.
>>
>> The first upstreamed device supporting KVM platform passthrough was the
>> Calxeda Midway XGMAC. Compared to this latter, the XGBE XGMAC exposes a
>> much more complex device tree node. Generating the device tree node for
>> the guest is the challenging and controversary part of this series.
>>
>> - First There are 2 device tree node formats:
>> one where XGBE and PHY are described in separate nodes and another one
>> that combines both description in a single node (only supported by 4.2
>> onwards kernels). Only the combined description is supported for passthrough,
>> meaning the host must be >= 4.2 and must feature a device tree with a 
>> combined
>> description. The guest will also be exposed with a combined description,
>> meaning only >= 4.2 guest are supported. It is not planned to support
>> separate node representation since assignment of the PHY is less
>> straigtforward.
>>
>> - the XGMAC/PHY node depends on 2 clock nodes (DMA and PTP).
>> The code checks those clocks are fixed to make sure they cannot be
>> switched off at some point after the native driver gets unbound.
>>
>> - there are many property values to populate on guest side. Most of them
>> cannot be hardcoded. That series proposes a way to parse the host device
>> tree blob and retrieve host values to feed guest representation. Current
>> approach relies on dtc binary availability plus libfdt usage.
>> Other alternatives were discussed in:
>> http://www.spinics.net/lists/kvm-arm/msg16648.html.
>>
>> - Currently host booted with ACPI is not supported.
>
> I won't pretend to know all the politics in the ARM space, but doesn't
> this last bullet sort of imply that this is dead-on-arrival code?  Maybe
> not in the embedded space, but certainly in the server space, I thought
> ACPI was declared the winner.  Thanks,
>
Hi Alex,

We currently have no solution to accomplish platform device
passthrough on ACPI, because we don't know how to extract and
encapsulate the necessary information from the host about the
passthrough device and its dependency (clocks and phys for example).

There are people interested in this work, in particular in the
networking space, who couldn't care less about ACPI, and for those,
supporting platform device passthrough with DT remains highly
relevant.

Therefore, I think it's useful to consider these patches for review
and for their basic approach.

Thanks,
-Christoffer

Re: [Qemu-devel] PING: [RFC PATCH 0/4] GICv3 live migration support

2015-10-07 Thread Christoffer Dall

On Wed, Oct 07, 2015 at 09:02:47AM +0100, Peter Maydell wrote:
> On 7 October 2015 at 08:57, Pavel Fedin  wrote:
> >  Knock-knock!
> >
> >  PM: I remember we had a talk that we should settle down on migration data 
> > format. Isn't it right
> > time?
> 
> I think Christoffer has a patchset which specifies the
> userspace API for KVM for this, which is probably the
> right place to start. Christoffer, did you manage to send
> that out yet?
> 
Very soon now.

-Christoffer

Re: [Qemu-devel] add multiple times opening support to a virtserialport

2015-08-27 Thread Christoffer Dall

On Thu, Aug 27, 2015 at 10:23:38AM -0400, Christopher Covington wrote:
 On 07/24/2015 08:00 AM, Matt Ma wrote:
  Hi all,
  
  Linaro has developed the foundation for the new Android Emulator code
  base based on a fairly recent upstream QEMU code base, when we
  re-based the code, we updated the device model to be more virtio based
  (for example the drives are now virtio block devices). The aim of this
  is to minimise the delta between upstream and the Android specific
  changes to QEMU. One Android emulator specific feature is the
  AndroidPipe.
  
  AndroidPipe is a communication channel between the guest system and
  the emulator itself. Guest side device node can be opened by multi
  processes at the same time with different service name. It has a
  de-multiplexer on the QEMU side to figure out which service the guest
  actually wanted, so the first write after opening device node is the
  service name guest wanted, after QEMU backend receive this service
  name, create a corresponding communication channel, initialize related
  component, such as file descriptor which connect to the host socket
  serve. So each opening in guest will create a separated communication
  channel.
  
  We can create a separate device for each service type, however some
  services, such as the OpenGL emulation, need to have multiple open
  channels at a time. This is currently not possible using the
  virtserialport which can only be opened once.
  
  Current virtserialport can not  be opened by multiple processes at the
  same time. I know virtserialport has provided buffers beforehand to
  cache data from host to guest, so even there is no guest read, data
  can still be transported from host to guest kernel, when there is
  guest read request, just copy cached data to user space.
  
  We are not sure clearly whether virtio can support
  multi-open-per-device semantics or not, followings are just our
  initial ideas about adding multi-open-per-device feature to a port:
  
  * when there is a open request on a port, kernel will allocate a
  portclient with new id and __wait_queue_head to track this request
  * save this portclient in file-private_data
  * guest kernel pass this portclient info to QEMU and notify that the
  port has been opened
  * QEMU backend will create a clientinfo struct to track this
  communication channel, initialize related component
  * we may change the kernel side strategy of allocating receiving
  buffers in advance to a new strategy, that is when there is a read
  request:
  - allocate a port_buffer, put user space buffer address to
  port_buffer.buf, share memory to avoid memcpy
  - put both portclient id(or portclient addrss) and port_buffer.buf
  to virtqueue, that is the length of buffers chain is 2
  - kick to notify QEMU backend to consume read buffer
  - QEMU backend read portclient info firstly to find the correct
  clientinfo, then read host data directly into virtqueue buffer to
  avoid memcpy
  - guest kernel will wait(similarly in block mode, because the user
  space address has been put into virtqueue) until QEMU backend has
  consumed buffer(all data/part data/nothing have been sent to host
  side)
  - if nothing has been read from host and file descriptor is in
  block mode, read request will wait through __wait_queue_head until
  host side is readable
  
  * above read logic may change the current behavior of transferring
  data to guest kernel even without guest user read
  
  * when there is a write request:
  - allocate a port_buffer, put user space buffer address to
  port_buffer.buf, share memory to avoid memcpy
  - put both portclient id(or portclient addrss) and port_buffer.buf
  to virtqueue, the length of buffers chain is 2
  - kick to notify QEMU backend to consume write buffer
  - QEMU backend read portclient info firstly to find the correct
  clientinfo, then write the virtqueue buffer content to host side as
  current logic
  - guest kernel will wait(similarly in block mode, because the user
  space address has been put into virtqueue) until QEMU backend has
  consumed buffer(all data/part data/nothing have been receive from host
  side)
  - if nothing has been sent out and file descriptor is in block
  mode, write request will wait through __wait_queue_head until host
  side is writable
  
  We obviously don't want to regress existing virtio behaviour and
  performance and welcome the communities expertise to point out
  anything we may have missed out before we get to far down implementing
  our initial proof-of-concept.

Hi Chris,

 
 Would virtio-vsock be interesting for your purposes?
 
 http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf
 
 (Video doesn't seem to be up yet, but should probably be available eventually
 at the following link)
 
 https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep
 
Thanks for looking at this lengthy mail.  Yes, we are

Re: [Qemu-devel] Created virtio-vsock wiki page

2015-08-26 Thread Christoffer Dall

On Tue, Aug 25, 2015 at 04:43:13PM +0100, Stefan Hajnoczi wrote:
 I have created a wiki page for virtio-vsock.
 
 It links to my git repos and the draft virtio specification:
 http://qemu-project.org/Features/VirtioVsock
 
 I'll expand and update it over the coming days and weeks.
 
 Please let me know if you'd like to see specific information on there
 (e.g. step-by-step build  QEMU invocation guide).
 
Do you have a test suite or a simple set of guest/host test programs?

-Christoffer

Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Christoffer Dall

On Wed, Aug 12, 2015 at 4:14 PM, Eric Auger eric.au...@linaro.org wrote:
 Hi,
 On 08/12/2015 03:23 PM, Christoffer Dall wrote:
 On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 12 August 2015 at 13:27, Pavel Fedin p.fe...@samsung.com wrote:
  Hello!

 I still think this is the wrong approach -- see my remarks
 in the previous round of patch review.

  You know... I thought a little bit...
  So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
 whether this type is supported. No actual actions is done by the kernel. 
 Is it correct?
 yes
 If yes, we can just leave this test as it is, because if it says that
 GICv2 is supported by KVM_CREATE_DEVICE, then:
 1. We use new API. No KVM_IRQCHIP_CREATE.
 2. GICv3 may be supported.

  Therefore, if we do this check, and it succeeds, then we just
 proceed, and later actually try to create GICv3. If it fails for some
 reason, we will see error message anyway. So would it be OK just not
 to touch kvm_arch_irqchip_create() at all?

 No, because then if the kernel only supports GICv3 the
 code in kvm_arch_irqchip_create() (as it is currently written)
 will erroneously fall back to using the old API.

 Christoffer: the question was, why does kvm_arch_irqchip_create()
 not only check the KVM_CAP_DEVICE_CTRL but also try to see
 if it can KVM_CREATE_DEVICE the GICv2 in order to avoid
 falling back to the old pre-KVM_CREATE_DEVICE API ? Are there
 kernels which have the capability bit set but which can't
 actually use KVM_CREATE_DEVICE to create the irqchip?

 My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is
 an orthogonal concept from how to create the vgic, and you could
 advertise this capability without also supporting the GICv2 device
 type.

 However, I don't believe such kernels exist
 the capability was advertised for arm/arm64 with GICv2
 7330672  KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC (1 year, 8
 months ago) Christoffer Dall
 so effectively I don't think we have any arm kernel advertising the CAP
 without GICv3.

  and they cannot in the
 future as that would be because we would remove an actively supported
 API.


 My preference here would be for kvm_arch_irqchip_create()
 to just use 'is the KVM_CAP_DEVICE_CTRL capability set'
 for its can we use the new API test; that will then
 work whether we have a GICv2 or GICv3 in the host. (The
 actual GIC device creation later on might fail, of course,
 but that can be handled at that point. The only thing
 we need to do as early as kvm_arch_irqchip_create is
 determine whether we must use the old API.)
 another way was proposed in the past was consisting in calling first
 ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V3, true);
 if this succeeds this means we have the new API (the only one used vy
 v3) and hence we have it as well for VGIC_V2, we can return ...
 if this fails we just can say VGICv3 KVM device hasn't registered, try
 KVM_DEV_TYPE_ARM_VGIC_V2
 if we have it return else fall back to older API

 I think it worked as well?

 Besides I think Peter's suggestion is simpler.

 For info we also use KVM VFIO device now.

Note that there's a difference between I called the create-device
ioctl, and the creation of the device failed vs. I called the ioctl
and I got an error because the ioctl is not supported, only in the
latter case should you fall back to the older API.

Not sure if the end result is the same with the suggested approach,
based on the right error values etc.

-Christoffer

Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Christoffer Dall

On Wed, Aug 12, 2015 at 1:44 PM, Pavel Fedin p.fe...@samsung.com wrote:
  Hello!

 I still think this is the wrong approach -- see my remarks
 in the previous round of patch review.

  Christoffer did not reply anything to your question back then. So - what to 
 do? Probe for all possible GICs? Remove the probe at all?

I may have missed something here, what was the question?

-Christoffer

Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Christoffer Dall

On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 12 August 2015 at 13:27, Pavel Fedin p.fe...@samsung.com wrote:
  Hello!

 I still think this is the wrong approach -- see my remarks
 in the previous round of patch review.

  You know... I thought a little bit...
  So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
 whether this type is supported. No actual actions is done by the kernel. Is 
 it correct? If yes, we can just leave this test as it is, because if it says 
 that GICv2 is supported by KVM_CREATE_DEVICE, then:
 1. We use new API. No KVM_IRQCHIP_CREATE.
 2. GICv3 may be supported.

  Therefore, if we do this check, and it succeeds, then we just
 proceed, and later actually try to create GICv3. If it fails for some
 reason, we will see error message anyway. So would it be OK just not
 to touch kvm_arch_irqchip_create() at all?

 No, because then if the kernel only supports GICv3 the
 code in kvm_arch_irqchip_create() (as it is currently written)
 will erroneously fall back to using the old API.

 Christoffer: the question was, why does kvm_arch_irqchip_create()
 not only check the KVM_CAP_DEVICE_CTRL but also try to see
 if it can KVM_CREATE_DEVICE the GICv2 in order to avoid
 falling back to the old pre-KVM_CREATE_DEVICE API ? Are there
 kernels which have the capability bit set but which can't
 actually use KVM_CREATE_DEVICE to create the irqchip?

My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is
an orthogonal concept from how to create the vgic, and you could
advertise this capability without also supporting the GICv2 device
type.

However, I don't believe such kernels exist and they cannot in the
future as that would be because we would remove an actively supported
API.


 My preference here would be for kvm_arch_irqchip_create()
 to just use 'is the KVM_CAP_DEVICE_CTRL capability set'
 for its can we use the new API test; that will then
 work whether we have a GICv2 or GICv3 in the host. (The
 actual GIC device creation later on might fail, of course,
 but that can be handled at that point. The only thing
 we need to do as early as kvm_arch_irqchip_create is
 determine whether we must use the old API.)

I'm fine with this.

-Christoffer

Re: [Qemu-devel] [PATCH v2] target-arm: kvm: Differentiate registers based on write-back levels

2015-07-17 Thread Christoffer Dall

On Fri, Jul 17, 2015 at 03:29:56PM +0100, Peter Maydell wrote:
 On 16 July 2015 at 12:34, Christoffer Dall christoffer.d...@linaro.org 
 wrote:
  Some registers like the CNTVCT register should only be written to the
  kernel as part of machine initialization or on vmload operations, but
  never during runtime, as this can potentially make time go backwards or
  create inconsistent time observations between VCPUs.
 
  Introduce a list of registers that should not be written back at runtime
  and check this list on syncing the register state to the KVM state.
 
 Thanks. I think this should go into QEMU 2.4, given that it fixes
 a bug with time misbehaving in guests. Are you happy that it's
 received enough testing? (I have given 32-bit KVM a spin but have
 no convenient 64-bit box to test with, and besides, I didn't notice the
 bug in the first place :-))

I tested this on both Juno and Mustang, with a simple loop kernel
booting and doing hackbench test, but if we want to be on the extra
careful side, perhaps Alex can run it through his migration test setup?
I don't think that's necessary though.

 
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  Changes since RFC:
   - Move cpreg_level to kvm_arm_cpreg_level and into kvm32.c and kvm64.c
   - Changed struct name and declare as static const
 
 I have a couple of minor comments on the comments below, and you
 forgot to update the stub version of write_list_to_kvmstate().
 I can just fix these up as I put it into target-arm.next, though.
 Fixed up version at:
 
  https://git.linaro.org/people/peter.maydell/qemu-arm.git target-arm.next
 
 If interested parties could test that by end-of-Monday that
 would be nice (since rc2 is scheduled for Tuesday).
 
   dtc  |  2 +-
   target-arm/kvm.c |  6 +-
   target-arm/kvm32.c   | 30 +-
   target-arm/kvm64.c   | 30 +-
   target-arm/kvm_arm.h | 12 +++-
   target-arm/machine.c |  2 +-
   6 files changed, 76 insertions(+), 6 deletions(-)
 
  diff --git a/dtc b/dtc
  index 65cc4d2..bc895d6 16
  --- a/dtc
  +++ b/dtc
  @@ -1 +1 @@
  -Subproject commit 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf
  +Subproject commit bc895d6d09695d05ceb8b52486ffe861d6cfbdde
 
 Stray submodule change :-)
 

Damn, keeps happening to me.  ok, I'll stop using git commit -a for qemu
changes.

  diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c
  index d7e7d68..6769815 100644
  --- a/target-arm/kvm32.c
  +++ b/target-arm/kvm32.c
  @@ -153,6 +153,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
   }
   }
 
  +typedef struct CPRegStateLevel {
  +uint64_t regidx;
  +int level;
  +} CPRegStateLevel;
  +
  +/* All coprocessor registers not listed in the following table are assumed 
  to
  + * be of the level KVM_PUT_RUNTIME_STATE, a register should be written less
 
 . If a register.
 
  + * often, you must add it to this table with a state of either
  + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
  + */
  +static const CPRegStateLevel non_runtime_cpregs[] = {
  +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
  +};
 
  diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
  index ac34f51..d59f41c 100644
  --- a/target-arm/kvm64.c
  +++ b/target-arm/kvm64.c
  @@ -139,6 +139,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
   }
   }
 
  +typedef struct CPRegStateLevel {
  +uint64_t regidx;
  +int level;
  +} CPRegStateLevel;
  +
  +/* All system not listed in the following table are assumed to be of the 
  level
 
 system registers
 
  + * KVM_PUT_RUNTIME_STATE, a register should be written less often, you must
 
 . If a register
 
  + * add it to this table with a state of either KVM_PUT_RESET_STATE or
  + * KVM_PUT_FULL_STATE.
  + */
  +static const CPRegStateLevel non_runtime_cpregs[] = {
  +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
  +};
 
  --- a/target-arm/kvm_arm.h
  +++ b/target-arm/kvm_arm.h
 
  @@ -83,7 +93,7 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx);
   * Note that we do not stop early on failure -- we will attempt
* writing all registers in the list.
   */
  -bool write_list_to_kvmstate(ARMCPU *cpu);
  +bool write_list_to_kvmstate(ARMCPU *cpu, int level);
 
 You forgot to update the stub function in target-arm/kvm-stub.c,
 so this breaks compilation on non-ARM hosts.
 
whoops, who cares about non-ARM hosts anyway.

Thanks for fixing these up, the fixed up version looks good.
-Christoffer

[Qemu-devel] [PATCH v2] target-arm: kvm: Differentiate registers based on write-back levels

2015-07-16 Thread Christoffer Dall

Some registers like the CNTVCT register should only be written to the
kernel as part of machine initialization or on vmload operations, but
never during runtime, as this can potentially make time go backwards or
create inconsistent time observations between VCPUs.

Introduce a list of registers that should not be written back at runtime
and check this list on syncing the register state to the KVM state.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since RFC:
 - Move cpreg_level to kvm_arm_cpreg_level and into kvm32.c and kvm64.c
 - Changed struct name and declare as static const

 dtc  |  2 +-
 target-arm/kvm.c |  6 +-
 target-arm/kvm32.c   | 30 +-
 target-arm/kvm64.c   | 30 +-
 target-arm/kvm_arm.h | 12 +++-
 target-arm/machine.c |  2 +-
 6 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/dtc b/dtc
index 65cc4d2..bc895d6 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf
+Subproject commit bc895d6d09695d05ceb8b52486ffe861d6cfbdde
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 548bfd7..b278542 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -409,7 +409,7 @@ bool write_kvmstate_to_list(ARMCPU *cpu)
 return ok;
 }
 
-bool write_list_to_kvmstate(ARMCPU *cpu)
+bool write_list_to_kvmstate(ARMCPU *cpu, int level)
 {
 CPUState *cs = CPU(cpu);
 int i;
@@ -421,6 +421,10 @@ bool write_list_to_kvmstate(ARMCPU *cpu)
 uint32_t v32;
 int ret;
 
+if (kvm_arm_cpreg_level(regidx)  level) {
+continue;
+}
+
 r.id = regidx;
 switch (regidx  KVM_REG_SIZE_MASK) {
 case KVM_REG_SIZE_U32:
diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c
index d7e7d68..6769815 100644
--- a/target-arm/kvm32.c
+++ b/target-arm/kvm32.c
@@ -153,6 +153,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 }
 }
 
+typedef struct CPRegStateLevel {
+uint64_t regidx;
+int level;
+} CPRegStateLevel;
+
+/* All coprocessor registers not listed in the following table are assumed to
+ * be of the level KVM_PUT_RUNTIME_STATE, a register should be written less
+ * often, you must add it to this table with a state of either
+ * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
+ */
+static const CPRegStateLevel non_runtime_cpregs[] = {
+{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
+};
+
+int kvm_arm_cpreg_level(uint64_t regidx)
+{
+int i;
+
+for (i = 0; i  ARRAY_SIZE(non_runtime_cpregs); i++) {
+const CPRegStateLevel *l = non_runtime_cpregs[i];
+if (l-regidx == regidx) {
+return l-level;
+}
+}
+
+return KVM_PUT_RUNTIME_STATE;
+}
+
 #define ARM_MPIDR_HWID_BITMASK 0xFF
 #define ARM_CPU_ID_MPIDR   0, 0, 0, 5
 
@@ -367,7 +395,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
  * managed to update the CPUARMState with, and only allowing those
  * to be written back up into the kernel).
  */
-if (!write_list_to_kvmstate(cpu)) {
+if (!write_list_to_kvmstate(cpu, level)) {
 return EINVAL;
 }
 
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index ac34f51..d59f41c 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -139,6 +139,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 }
 }
 
+typedef struct CPRegStateLevel {
+uint64_t regidx;
+int level;
+} CPRegStateLevel;
+
+/* All system not listed in the following table are assumed to be of the level
+ * KVM_PUT_RUNTIME_STATE, a register should be written less often, you must
+ * add it to this table with a state of either KVM_PUT_RESET_STATE or
+ * KVM_PUT_FULL_STATE.
+ */
+static const CPRegStateLevel non_runtime_cpregs[] = {
+{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
+};
+
+int kvm_arm_cpreg_level(uint64_t regidx)
+{
+int i;
+
+for (i = 0; i  ARRAY_SIZE(non_runtime_cpregs); i++) {
+const CPRegStateLevel *l = non_runtime_cpregs[i];
+if (l-regidx == regidx) {
+return l-level;
+}
+}
+
+return KVM_PUT_RUNTIME_STATE;
+}
+
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
@@ -280,7 +308,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-if (!write_list_to_kvmstate(cpu)) {
+if (!write_list_to_kvmstate(cpu, level)) {
 return EINVAL;
 }
 
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 5abd591..7912d74 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -69,8 +69,18 @@ int kvm_arm_init_cpreg_list(ARMCPU *cpu);
 bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx);
 
 /**
+ * kvm_arm_cpreg_level
+ * regidx: KVM register index
+ *
+ * Return the level of this coprocessor/system register.  Return value is
+ * either KVM_PUT_RUNTIME_STATE, KVM_PUT_RESET_STATE, or KVM_PUT_FULL_STATE

Re: [Qemu-devel] [RFC PATCH] target-arm: kvm: Differentiate registers based on write-back levels

2015-07-11 Thread Christoffer Dall

On Fri, Jul 10, 2015 at 12:22:31PM +0100, Peter Maydell wrote:
 On 10 July 2015 at 12:00, Christoffer Dall christoffer.d...@linaro.org 
 wrote:
  Some registers like the CNTVCT register should only be written to the
  kernel as part of machine initialization or on vmload operations, but
  never during runtime, as this can potentially make time go backwards or
  create inconsistent time observations between VCPUs.
 
  Introduce a list of registers that should not be written back at runtime
  and check this list on syncing the register state to the KVM state.
 
 Thanks for picking this one up...
 
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
   target-arm/kvm.c | 34 +-
   target-arm/kvm32.c   |  2 +-
   target-arm/kvm64.c   |  2 +-
   target-arm/kvm_arm.h |  3 ++-
   target-arm/machine.c |  2 +-
   5 files changed, 38 insertions(+), 5 deletions(-)
 
  diff --git a/target-arm/kvm.c b/target-arm/kvm.c
  index 548bfd7..2e92699 100644
  --- a/target-arm/kvm.c
  +++ b/target-arm/kvm.c
  @@ -409,7 +409,35 @@ bool write_kvmstate_to_list(ARMCPU *cpu)
   return ok;
   }
 
  -bool write_list_to_kvmstate(ARMCPU *cpu)
  +typedef struct cpreg_state_level {
  +uint64_t kvm_idx;
  +int level;
  +} cpreg_state_level;
 
 (QEMU's coding style prefers CPRegStateLevel for struct types.)
 

ok

  +
  +/* All system registers not listed in the following table are assumed to be
  + * of the level KVM_PUT_RUNTIME_STATE, a register should be written less
  + * often, you must add it to this table with a state of either
  + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
  + */
  +cpreg_state_level non_runtime_cpregs[] = {
  +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
 
 This should be KVM_PUT_RESET_STATE, right?
 
should it?  If you reset a real machine, you will not necessarily see a
counter value of zero will you?

I guess this depends on whether QEMU reset means power the system
completely off and then on again, or some softer reset?

  +};
 
 The other option here would be to keep the level information
 in the cpreg structs (which is where we put everything else
 we know about cpregs); we'd probably need to then initialise
 some other data structure if we wanted to avoid the hash
 table lookup for every register in write_list_to_kvmstate.
 
 I guess if we expect this list to remain a fairly small
 set of exceptional cases then this is OK (and vaguely
 comparable to the existing kvm_arm_reg_syncs_via-cpreg_list
 handling).

I thought about this too, and sent this as an RFC for exactly this
reason.  I did it this way initially for two reasons: (1) I don't
understand the hash-table register initialization flow for aarch64 and
(2) I could really only identify this single register for now that needs
to be marked as a non-runtime register, and then this is less invasive.

 
 Don't we need separate 32-bit and 64-bit versions of
 this list?
 
Do we?  I thought this file would compile separately for the 32-bit and
64-bit versions and the register index define is the same name for both
architectures, did I get this wrong?

Of course, for other registers with unique-to-32-bit-or-64-bit reg index
defines, yes, we would need a separate table.  Should they then be
defined in the kvm32.c and kvm64.c and passed in as a pointer to
write_kvmstate_to_list() ?

Thanks,
-Christoffer

[Qemu-devel] [RFC PATCH] target-arm: kvm: Differentiate registers based on write-back levels

2015-07-10 Thread Christoffer Dall

Some registers like the CNTVCT register should only be written to the
kernel as part of machine initialization or on vmload operations, but
never during runtime, as this can potentially make time go backwards or
create inconsistent time observations between VCPUs.

Introduce a list of registers that should not be written back at runtime
and check this list on syncing the register state to the KVM state.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
 target-arm/kvm.c | 34 +-
 target-arm/kvm32.c   |  2 +-
 target-arm/kvm64.c   |  2 +-
 target-arm/kvm_arm.h |  3 ++-
 target-arm/machine.c |  2 +-
 5 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 548bfd7..2e92699 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -409,7 +409,35 @@ bool write_kvmstate_to_list(ARMCPU *cpu)
 return ok;
 }
 
-bool write_list_to_kvmstate(ARMCPU *cpu)
+typedef struct cpreg_state_level {
+uint64_t kvm_idx;
+int level;
+} cpreg_state_level;
+
+/* All system registers not listed in the following table are assumed to be
+ * of the level KVM_PUT_RUNTIME_STATE, a register should be written less
+ * often, you must add it to this table with a state of either
+ * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
+ */
+cpreg_state_level non_runtime_cpregs[] = {
+{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
+};
+
+static int cpreg_level(uint64_t kvm_idx)
+{
+int i;
+
+for (i = 0; i  ARRAY_SIZE(non_runtime_cpregs); i++) {
+cpreg_state_level *l = non_runtime_cpregs[i];
+if (l-kvm_idx == kvm_idx) {
+return l-level;
+}
+}
+
+return KVM_PUT_RUNTIME_STATE;
+}
+
+bool write_list_to_kvmstate(ARMCPU *cpu, int level)
 {
 CPUState *cs = CPU(cpu);
 int i;
@@ -421,6 +449,10 @@ bool write_list_to_kvmstate(ARMCPU *cpu)
 uint32_t v32;
 int ret;
 
+if (cpreg_level(regidx)  level) {
+continue;
+}
+
 r.id = regidx;
 switch (regidx  KVM_REG_SIZE_MASK) {
 case KVM_REG_SIZE_U32:
diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c
index d7e7d68..9fbd5fd 100644
--- a/target-arm/kvm32.c
+++ b/target-arm/kvm32.c
@@ -367,7 +367,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
  * managed to update the CPUARMState with, and only allowing those
  * to be written back up into the kernel).
  */
-if (!write_list_to_kvmstate(cpu)) {
+if (!write_list_to_kvmstate(cpu, level)) {
 return EINVAL;
 }
 
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index ac34f51..2911679 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -280,7 +280,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-if (!write_list_to_kvmstate(cpu)) {
+if (!write_list_to_kvmstate(cpu, level)) {
 return EINVAL;
 }
 
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 5abd591..ce03e97 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -71,6 +71,7 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx);
 /**
  * write_list_to_kvmstate:
  * @cpu: ARMCPU
+ * @level: the state level to sync
  *
  * For each register listed in the ARMCPU cpreg_indexes list, write
  * its value from the cpreg_values list into the kernel (via ioctl).
@@ -83,7 +84,7 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx);
  * Note that we do not stop early on failure -- we will attempt
  * writing all registers in the list.
  */
-bool write_list_to_kvmstate(ARMCPU *cpu);
+bool write_list_to_kvmstate(ARMCPU *cpu, int level);
 
 /**
  * write_kvmstate_to_list:
diff --git a/target-arm/machine.c b/target-arm/machine.c
index 9eb51df..32adfe7 100644
--- a/target-arm/machine.c
+++ b/target-arm/machine.c
@@ -251,7 +251,7 @@ static int cpu_post_load(void *opaque, int version_id)
 }
 
 if (kvm_enabled()) {
-if (!write_list_to_kvmstate(cpu)) {
+if (!write_list_to_kvmstate(cpu, KVM_PUT_FULL_STATE)) {
 return -1;
 }
 /* Note that it's OK for the TCG side not to know about
-- 
2.1.2.330.g565301e.dirty

Re: [Qemu-devel] [PATCH v4 3/4] target-arm: Extend the gic node properties

2015-07-09 Thread Christoffer Dall

On Thu, Jul 9, 2015 at 6:28 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 9 July 2015 at 17:03, Christoffer Dall christoffer.d...@linaro.org wrote:
 [whoops, re-adding qemu-devel]

 On Thu, Jul 9, 2015 at 5:07 PM, Alexander Graf ag...@suse.de wrote:
 On 07/09/15 16:44, Christoffer Dall wrote:
 I'll be honest and say that I don't fully understand the details of the
 interrupt-map specification, and I cannot seem to find it online either
 (the working link I had before gives me a 404 these days).

 Try http://www.firmware.org/1275/practice/imap/imap0_9d.pdf


that's the one, thanks for the working url.

 It explains it conceptually, yes, but it's hardly a spec.  At least I
 can't understand from that page why the entries in the map have to be
 changed based on size-cells and address-cells in the interrupt
 controller...

 The interrupt map entries are:
  * child unit interrupt specifier [4 cells for PCI, determined
by #address-cells + #interrupt-cells for the PCI controller node]
  * interrupt parent phandle
  * parent unit interrupt specifier [number of cells determined
by #address-cells + #interrupt-cells for the interrupt controller]

 So the extra two zeroes aren't part of the phandle, they're
 the result of the parent unit-interrupt-specifier now being
 5 cells because of #address-cells being defined in the GIC node.

Thanks for clearing this up!

I owe you a cup of your favorite poison then:)

-Christoffer

Re: [Qemu-devel] [PATCH v4 3/4] target-arm: Extend the gic node properties

2015-07-09 Thread Christoffer Dall

[whoops, re-adding qemu-devel]

On Thu, Jul 9, 2015 at 5:07 PM, Alexander Graf ag...@suse.de wrote:
 On 07/09/15 16:44, Christoffer Dall wrote:

 On Thu, Jul 09, 2015 at 04:02:14PM +0200, Claudio Fontana wrote:

 Hello Christoffer,

 just one question:

 In preparation for adding the GICv2m which requires address specifiers
 and is a subnode of the gic, we extend the gic DT definition to specify
 the #address-cells and #size-cells properties and add an empty ranges
 property properties of the DT node, since this is required to add the
 v2m node as a child of the gic node.

 Note that we must also expand the irq-map to reference the gic with the
 right address-cells as a consequence of this change.

 Reviewed-by: Eric Auger address@hidden
 Suggested-by: Shanker Donthineni address@hidden
 Signed-off-by: Christoffer Dall address@hidden
 ---
 Changes since v3:
   - Rewrote patch and changed authorship and tags accordingly
   - Fixed spelling in commit message
 Changes since v2:
   - New separate patch factoring out changes to existing code for eased
 bisectability in case we broke something
   - The above fixes the issue with non-MSI compatible guests.

   hw/arm/virt.c | 11 +++
   1 file changed, 7 insertions(+), 4 deletions(-)

 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index e5235ef..387dac8 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -317,6 +317,9 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi)
2,
 vbi-memmap[VIRT_GIC_DIST].size,
2,
 vbi-memmap[VIRT_GIC_CPU].base,
2,
 vbi-memmap[VIRT_GIC_CPU].size);
 +qemu_fdt_setprop_cell(vbi-fdt, /intc, #address-cells, 0x2);
 +qemu_fdt_setprop_cell(vbi-fdt, /intc, #size-cells, 0x2);
 +qemu_fdt_setprop(vbi-fdt, /intc, ranges, NULL, 0);
   qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle,
 vbi-gic_phandle);
   }
   @@ -585,7 +588,7 @@ static void create_pcie_irq_map(const
 VirtBoardInfo *vbi,
 uint32_t gic_phandle,
   int first_irq, const char *nodename)
   {
   int devfn, pin;
 -uint32_t full_irq_map[4 * 4 * 8] = { 0 };
 +uint32_t full_irq_map[4 * 4 * 10] = { 0 };
   uint32_t *irq_map = full_irq_map;
 for (devfn = 0; devfn = 0x18; devfn += 0x8) {
 @@ -598,13 +601,13 @@ static void create_pcie_irq_map(const
 VirtBoardInfo *vbi,
 uint32_t gic_phandle,
   uint32_t map[] = {
   devfn  8, 0, 0,   /* devfn
 */
   pin + 1,/* PCI pin
 */
 -gic_phandle, irq_type, irq_nr, irq_level }; /* GIC irq
 */
 +gic_phandle, 0, 0, irq_type, irq_nr, irq_level }; /*
 GIC irq */

 can you help me understand how to decode this?
 We are adding two 32bit 0 values in the map.
 So now gic_phandle field occupies a total of 3 x 32bit values.
 Is this really what was intended?

 If the #address-cell and #size-cells of the intc are 2 and 2
 respectively,
 shouldn't this be

 gic_phandle, 0, irq_type, irq_nr, irq_level ?

 Too much time since I read this code I guess..

 I'll be honest and say that I don't fully understand the details of the
 interrupt-map specification, and I cannot seem to find it online either
 (the working link I had before gives me a 404 these days).

 So I followed the suggestion from Shanker and it worked with adding two
 zeroes, not with only adding one, so I figured it was the correct thing
 to do.

 Perhaps Alex Graf who wrote the original code to generate the interrupt
 map can help?


 The page works quite well for me:

   http://devicetree.org/Device_Tree_Usage#Advanced_Interrupt_Mapping

It explains it conceptually, yes, but it's hardly a spec.  At least I
can't understand from that page why the entries in the map have to be
changed based on size-cells and address-cells in the interrupt
controller...

I didn't think the bits we needed to add were related to the phandle;
I always thought a phandle was just an internal to the DT 32-bit
number to refer to a different node.


 When it comes to dt bits where I'm uncertain, I usually end up asking Rob
 though :).


Rob, can you help us out here?  Beers are on me at SFO15 ;)

-Christoffer

Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500

2015-07-03 Thread Christoffer Dall

On Fri, Jul 3, 2015 at 8:52 AM, Pavel Fedin p.fe...@samsung.com wrote:
  Hi!

 I think you should leave out the software emulation part for this
 series and let's concentrate on getting this in first.

  Ok.

 Also, you should make sure you have an agreement with Shlomo about
 taking over his patches before doing so (maybe you are already?).

  Shlomo has posted his work here, doesn't this automatically mean that he is 
 sharing it with the community? Additionally, i do not drop his authorship.

Well, it can become a little messy for everyone if Shlomo is working
on refactoring and respinning his patches, and meanwhile new versions
of his patches go out embedded as part of another patch set.  It's not
exactly harmful, but usually doesn't happen without prior agreement
amongst the involved parties, I would say.

-Christoffer

Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500

2015-07-03 Thread Christoffer Dall

On Fri, Jul 3, 2015 at 10:47 AM, Pavel Fedin p.fe...@samsung.com wrote:
  Hi!

 Well, it can become a little messy for everyone if Shlomo is working
 on refactoring and respinning his patches, and meanwhile new versions
 of his patches go out embedded as part of another patch set

  Exactly for this reason i refactor his code as little as possible, so that 
 he can merge things easily, if he is around. But looks like he become upset 
 and gone, we haven't seen him around for several weeks.

I don't think you should put words in other people's mouth.  He may be
on vacation or awaiting approval for submitting this stuff or
something like that.

That said, Shlomo, what is your status?

  Meanwhile i'll try to incorporate SW emulation too, because i also may need 
 it. Just needs more time, and yes, ITS is a separate thing (by the way i even 
 have the code, just without LPIs). So yes, i'll split it up a bit more.


I *really* think we should try to *agree* on who does what here, so
that we don't have more copies of the same work etc. on the list.

-Christoffer

Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500

2015-07-02 Thread Christoffer Dall

On Thu, Jul 2, 2015 at 4:47 PM, Pavel Fedin p.fe...@samsung.com wrote:
  Hello!

  I already explained this earlier: 
 http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg04842.html, and i 
 tried to explain this in commit message. Current qemu architecture does not 
 allow doing this in a clean way.
  I can simply change mc-max_cpus for 'virt' machine to 64 (always), and then 
 check user-supplied value against GIC limitation by myself. And produce 
 error. This will be code duplication. Do you think it is better?
  Anybody else (Peter, Cristoffer ?), please vote.

We definitely need a single virt machine, not multiple machines.  I
don't know enough about the QEMU internals as to how to accomplish
this in detail.

-Christoffer

Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500

2015-07-02 Thread Christoffer Dall

On Thu, Jul 2, 2015 at 5:30 PM, Pavel Fedin p.fe...@samsung.com wrote:
 We definitely need a single virt machine, not multiple machines.  I
 don't know enough about the QEMU internals as to how to accomplish
 this in detail.

  Two beat one... Okay, i surrender. :)
  I will respin this, hopefully tomorrow. I will change to machine option, and 
 i will likely leave out software GIC emulation. Everyone okay with this? 
 Peter?

I think you should leave out the software emulation part for this
series and let's concentrate on getting this in first.

Also, you should make sure you have an agreement with Shlomo about
taking over his patches before doing so (maybe you are already?).

Thanks,
-Christoffer

Re: [Qemu-devel] [PATCH RFC 0/4] vGICv3 support

2015-07-01 Thread Christoffer Dall

On Fri, May 22, 2015 at 01:58:40PM +0300, Pavel Fedin wrote:
  This is my alternative to Ashok's vGICv3 patch
 (https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg03021.html), which
 i am currently working on. It addresses vGIC capability verification issue
 (kvm_irqchip_create() / kvm_arch_irqchip_create()), as well as offers better
 code structure (v3 code separated from v2).
  This patchset applies on top of this:
 https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00943.html. Note that
 GIC type selection still relies on machine name (virt-v3 vs virt), and not on
 machine option. Since libvirt has recently introduced support for extra 
 options,
 i have absolutely nothing against Ashok's approach. I just did not change this
 yet because it would affect my testing environment. The aim of this RFC is to
 focus on vGICv3 implementation and related changes. And yes, i agree that v2 
 and
 v3 now have some copypasted code, and this is TBD.

This cover letter is not really helpful as it only describes the history
and circumstances of how this patch came to be.

It would be helpful if the beginning of this cover letter focuses on
what the patch series does and which design decisions have been taken to
shape the patches the way they are.

I don't understand the whole background thing about libvirt and I don't
at all understand the thing about copy-pasted code...??

A generally good approach to writing a cover letter is to follow this
skeleton:

---
This series implements...what

We accomplish this by...design decisions

[Optional] Patches 1-3 do something preliminary, patches 4-8 do something 
else...

[Optional] Note something special, possibly historical

Changes since vX:
---

I think it would be good if you could re-spin this series based on
Eric's comments on the code, my comments on the patch style, and Peter's
advise on using machine properties for GICv3.

Do you have cycles to continue working on this?

-Christoffer

 
 Pavel Fedin (4):
   Add virt-v3 machine that uses GIC-500
   Set kernel_irqchip_type for other ARM boards which use GIC
   First bits of vGICv3 support:
   Initial implementation of vGICv3.
 
  hw/arm/exynos4_boards.c |   1 +
  hw/arm/realview.c   |   1 +
  hw/arm/vexpress.c   |   1 +
  hw/arm/virt.c   | 148 -
  hw/intc/Makefile.objs   |   1 +
  hw/intc/arm_gicv3_kvm.c | 283 
 
  include/hw/boards.h |   1 +
  include/sysemu/kvm.h|   3 +-
  kvm-all.c   |   2 +-
  stubs/kvm.c |   2 +-
  target-arm/kvm.c|   8 +-
  11 files changed, 419 insertions(+), 32 deletions(-)
  create mode 100644 hw/intc/arm_gicv3_kvm.c
 
 -- 
 1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH RFC 0/4] vGICv3 support

2015-07-01 Thread Christoffer Dall

On Wed, Jul 01, 2015 at 02:14:55PM +0300, Pavel Fedin wrote:
  Hello!
 
  I think it would be good if you could re-spin this series based on
  Eric's comments on the code, my comments on the patch style, and Peter's
  advise on using machine properties for GICv3.
 
 There was a discussion about this, and i tried to add a machine property 
 instead. This gave bad
 results: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg04842.html
 
  Do you have cycles to continue working on this?
 
  Yes, i do. And i wish to work on this, since upstreaming this is a goal of 
 my project.
  I know that descriptions were bad, and actually they were parts of ongoing 
 discussions. Actually i
 am waiting for integration of Shlomo Pongratz' series:
 http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg01512.html. Posted 
 as standalone, my
 series collide with Shlomo's series (it actually relies on some parts from 
 there), so i prefer to
 respin on top of something clean.

I thought the general sense here was that since emulating the full
device is much more complicated than driving the KVM part, the
integration with the virt board should go in via this series, and the
emulation should build on top of that?

Of course, if the full emulation series is almost ready to be merged
that's a different story.

It just felt like both patch series have stalled somehow, and I would
like to see what we can do to get this stuff moving again.

  The first patch from the mentioned series is already in master; Peter told 
 me that he doesn't want
 to add the complete GICv3 before qemu 2.4 is released:
 https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg04171.html
  So waiting...
 
Fair enough.

-Christoffer

Re: [Qemu-devel] [PATCH RFC 1/4] Add virt-v3 machine that uses GIC-500

2015-07-01 Thread Christoffer Dall

On Fri, May 22, 2015 at 01:58:41PM +0300, Pavel Fedin wrote:
 This patch introduces kernel_irqchip_type member in Machine class. Currently 
 it it used only by virt machine for its internal purposes, however in future 
 it is to be passed to KVM in kvm_irqchip_create(). The variable is defined as 
 int in order to be architecture agnostic, for potential future users.

Can you use a decent editor/mail agent and break your lines at 72 chars
for commit messages please?

This commit message is very code-specific and doesn't explain the
overall change/purpose; for example I don't understand from reading this
commit if the patch expects a new machine virt-v3 or using machine
properties as discussed before.

I think it's the former and I think you should re-spin these patches
using a property, since Peter already clearly expressed how this should
be done and it's no use reviewing something which we already know is not
the right approach:
https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02204.html

Thanks,
-Christoffer



 
 Signed-off-by: Pavel Fedin p.fe...@samsung.com
 
 ---
  hw/arm/virt.c   | 148 
 +++-
  include/hw/boards.h |   1 +
  2 files changed, 123 insertions(+), 26 deletions(-)
 
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index a1186c5..15724b2 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -66,6 +66,10 @@ enum {
  VIRT_CPUPERIPHS,
  VIRT_GIC_DIST,
  VIRT_GIC_CPU,
 +VIRT_GIC_DIST_SPI = VIRT_GIC_CPU,
 +VIRT_ITS_CONTROL,
 +VIRT_ITS_TRANSLATION,
 +VIRT_LPI,
  VIRT_UART,
  VIRT_MMIO,
  VIRT_RTC,
 @@ -107,6 +111,8 @@ typedef struct {
  #define VIRT_MACHINE_CLASS(klass) \
  OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE)
  
 +#define TYPE_VIRTV3_MACHINE   virt-v3
 +
  /* Addresses and sizes of our components.
   * 0..128MB is space for a flash device so we can run bootrom code such as 
 UEFI.
   * 128MB..256MB is used for miscellaneous device I/O.
 @@ -121,25 +127,29 @@ typedef struct {
   */
  static const MemMapEntry a15memmap[] = {
  /* Space up to 0x800 is reserved for a boot ROM */
 -[VIRT_FLASH] =  {  0, 0x0800 },
 -[VIRT_CPUPERIPHS] = { 0x0800, 0x0002 },
 +[VIRT_FLASH] =   {  0, 0x0800 },
 +[VIRT_CPUPERIPHS] =  { 0x0800, 0x0002 },
  /* GIC distributor and CPU interfaces sit inside the CPU peripheral 
 space */
 -[VIRT_GIC_DIST] =   { 0x0800, 0x0001 },
 -[VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
 -[VIRT_UART] =   { 0x0900, 0x1000 },
 -[VIRT_RTC] ={ 0x0901, 0x1000 },
 -[VIRT_FW_CFG] = { 0x0902, 0x000a },
 -[VIRT_MMIO] =   { 0x0a00, 0x0200 },
 +[VIRT_GIC_DIST] ={ 0x0800, 0x0001 },
 +[VIRT_GIC_CPU] = { 0x0801, 0x0001 },
 +/* On v3 VIRT_GIC_DIST_SPI takes place of VIRT_GIC_CPU */
 +[VIRT_ITS_CONTROL] = { 0x0802, 0x0001 },
 +[VIRT_ITS_TRANSLATION] = { 0x0803, 0x0001 },
 +[VIRT_LPI] = { 0x0804, 0x0080 },
 +[VIRT_UART] ={ 0x0900, 0x1000 },
 +[VIRT_RTC] = { 0x0901, 0x1000 },
 +[VIRT_FW_CFG] =  { 0x0902, 0x000a },
 +[VIRT_MMIO] ={ 0x0a00, 0x0200 },
  /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size 
 */
  /*
   * PCIE verbose map:
   *
 - * MMIO window  { 0x1000, 0x2eff },
 - * PIO window   { 0x3eff, 0x0001 },
 - * ECAM { 0x3f00, 0x0100 },
 + * MMIO window   { 0x1000, 0x2eff },
 + * PIO window{ 0x3eff, 0x0001 },
 + * ECAM  { 0x3f00, 0x0100 },
   */
 -[VIRT_PCIE] =   { 0x1000, 0x3000 },
 -[VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 },
 +[VIRT_PCIE] ={ 0x1000, 0x3000 },
 +[VIRT_MEM] = { 0x4000, 30ULL * 1024 * 1024 * 1024 },
  };
  
  static const int a15irqmap[] = {
 @@ -273,9 +283,11 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi)
   */
  ARMCPU *armcpu;
  uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI;
 +/* Argument is 32 bit but 8 bits are reserved for flags */
 +uint32_t max = (vbi-smp_cpus = 24) ? 24 : vbi-smp_cpus;
  
  irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
 - GIC_FDT_IRQ_PPI_CPU_WIDTH, (1  vbi-smp_cpus) - 
 1);
 + GIC_FDT_IRQ_PPI_CPU_WIDTH, (1  max) - 1);
  
  qemu_fdt_add_subnode(vbi-fdt, /timer);
  
 @@ -299,6 +311,18 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
  {
  int cpu;
  
 +/*
 + * From Documentation/devicetree/bindings/arm/cpus.txt
 + *  On ARM v8 64-bit systems value should be set to 2,
 + *  that corresponds to the

Re: [Qemu-devel] [PATCH RFC 2/4] Set kernel_irqchip_type for other ARM boards which use GIC

2015-07-01 Thread Christoffer Dall

Missing commit text completely?

On Fri, May 22, 2015 at 01:58:42PM +0300, Pavel Fedin wrote:
 Signed-off-by: Pavel Fedin p.fe...@samsung.com
 ---
  hw/arm/exynos4_boards.c | 1 +
  hw/arm/realview.c   | 1 +
  hw/arm/vexpress.c   | 1 +
  3 files changed, 3 insertions(+)
 
 diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
 index d644db1..d4136bc 100644
 --- a/hw/arm/exynos4_boards.c
 +++ b/hw/arm/exynos4_boards.c
 @@ -104,6 +104,7 @@ static Exynos4210State 
 *exynos4_boards_init_common(MachineState *machine,
  exynos4_machines[board_type].max_cpus);
  }
  
 +machine-kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2;
  exynos4_board_binfo.ram_size = exynos4_board_ram_size[board_type];
  exynos4_board_binfo.board_id = exynos4_board_id[board_type];
  exynos4_board_binfo.smp_bootreg_addr =
 diff --git a/hw/arm/realview.c b/hw/arm/realview.c
 index ef2788d..f670d9f 100644
 --- a/hw/arm/realview.c
 +++ b/hw/arm/realview.c
 @@ -74,6 +74,7 @@ static void realview_init(MachineState *machine,
  ram_addr_t ram_size = machine-ram_size;
  hwaddr periphbase = 0;
  
 +machine-kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2;
  switch (board_type) {
  case BOARD_EB:
  break;
 diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
 index 8f1a5ea..b0a29f1 100644
 --- a/hw/arm/vexpress.c
 +++ b/hw/arm/vexpress.c
 @@ -556,6 +556,7 @@ static void vexpress_common_init(MachineState *machine)
  const hwaddr *map = daughterboard-motherboard_map;
  int i;
  
 +machine-kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2;
  daughterboard-init(vms, machine-ram_size, machine-cpu_model, pic);
  
  /*
 -- 
 1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH RFC 3/4] First bits of vGICv3 support:

2015-07-01 Thread Christoffer Dall

Hi Pavel,

Please get rid of the trailing colon in the subject message. 

On Fri, May 22, 2015 at 01:58:43PM +0300, Pavel Fedin wrote:
 - Make use of kernel_irqchip_type in kvm_arch_irqchip_create()
 - Instantiate kvm-arm-gicv3 class (not implemented yet) for GICv3 with KVM 
 acceleration

I feel like these bullets should be written as paragraphs with a proper
explanation of what we are doing here from a high-level point of view.

Thanks,
-Christoffer

 
 Signed-off-by: Pavel Fedin p.fe...@samsung.com
 ---
  hw/arm/virt.c| 6 ++
  include/sysemu/kvm.h | 3 ++-
  kvm-all.c| 2 +-
  stubs/kvm.c  | 2 +-
  target-arm/kvm.c | 8 ++--
  5 files changed, 12 insertions(+), 9 deletions(-)
 
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index 15724b2..1e42e59 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -400,11 +400,9 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, 
 qemu_irq *pic, int type)
  int i;
  
  if (type == KVM_DEV_TYPE_ARM_VGIC_V3) {
 -gictype = arm_gicv3;
 -} else if (kvm_irqchip_in_kernel()) {
 -gictype = kvm-arm-gic;
 +gictype = kvm_irqchip_in_kernel() ? kvm-arm-gicv3 : arm_gicv3;
  } else {
 -gictype = arm_gic;
 +gictype = kvm_irqchip_in_kernel() ? kvm-arm-gic : arm_gic;
  }
  
  gicdev = qdev_create(NULL, gictype);
 diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
 index 4878959..5d90257 100644
 --- a/include/sysemu/kvm.h
 +++ b/include/sysemu/kvm.h
 @@ -424,6 +424,7 @@ void kvm_init_irq_routing(KVMState *s);
  /**
   * kvm_arch_irqchip_create:
   * @KVMState: The KVMState pointer
 + * @type: irqchip type, architecture-specific
   *
   * Allow architectures to create an in-kernel irq chip themselves.
   *
 @@ -431,7 +432,7 @@ void kvm_init_irq_routing(KVMState *s);
   *0: irq chip was not created
   *   0: irq chip was created
   */
 -int kvm_arch_irqchip_create(KVMState *s);
 +int kvm_arch_irqchip_create(KVMState *s, int type);
  
  /**
   * kvm_set_one_reg - set a register value in KVM via KVM_SET_ONE_REG ioctl
 diff --git a/kvm-all.c b/kvm-all.c
 index 17a3771..22e2621 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1393,7 +1393,7 @@ static int kvm_irqchip_create(MachineState *machine, 
 KVMState *s)
  
  /* First probe and see if there's a arch-specific hook to create the
   * in-kernel irqchip for us */
 -ret = kvm_arch_irqchip_create(s);
 +ret = kvm_arch_irqchip_create(s, machine-kernel_irqchip_type);
  if (ret  0) {
  return ret;
  } else if (ret == 0) {
 diff --git a/stubs/kvm.c b/stubs/kvm.c
 index e7c60b6..a8505ff 100644
 --- a/stubs/kvm.c
 +++ b/stubs/kvm.c
 @@ -1,7 +1,7 @@
  #include qemu-common.h
  #include sysemu/kvm.h
  
 -int kvm_arch_irqchip_create(KVMState *s)
 +int kvm_arch_irqchip_create(KVMState *s, int type)
  {
  return 0;
  }
 diff --git a/target-arm/kvm.c b/target-arm/kvm.c
 index 16abbf1..65794cf 100644
 --- a/target-arm/kvm.c
 +++ b/target-arm/kvm.c
 @@ -579,7 +579,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
  {
  }
  
 -int kvm_arch_irqchip_create(KVMState *s)
 +int kvm_arch_irqchip_create(KVMState *s, int type)
  {
  int ret;
  
 @@ -587,11 +587,15 @@ int kvm_arch_irqchip_create(KVMState *s)
   * let the device do this when it initializes itself, otherwise we
   * fall back to the old API */
  
 -ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V2, true);
 +ret = kvm_create_device(s, type, true);
  if (ret == 0) {
  return 1;
  }
  
 +/* Fallback will create VGIC v2 */
 +if (type != KVM_DEV_TYPE_ARM_VGIC_V2) {
 +return ret;
 +}
  return 0;
  }
  
 -- 
 1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH RFC 4/4] Initial implementation of vGICv3.

2015-07-01 Thread Christoffer Dall

Hi Pavel,

On Fri, May 22, 2015 at 07:57:13PM +0300, Pavel Fedin wrote:
  Hi!
 
  Looks GICv3 common class currently miss this security_extn field +
  parent_fiq so it does not compile without changes. Or did I miss something?
 
  Just throw this if(...) away. It's my fault. Actually i have rebased 
 Shlomo's patches on
 yesterday's master, and during this i added parent_fiq[] to GICv3, because 
 without filling
 these in the thing stopped working due to recent GICv2 code changes. I needed 
 to fix the
 problem quickly, so i copied initializing parent_fiq together with 
 security_extn field
 from v2 code. But, this makes no sense because security_extn is never 
 initialized, it is
 not even assigned property name. So just remove this small fragment, it's not 
 needed now.
  I have fixed this in my working tree 2 hours ago. parent_fiq is declared as:
 --- cut ---
 qemu_irq parent_fiq[GICV3_NCPU];
 --- cut ---
 

I think based on this comment that I can safely suggest a bit of advise
regarding sending patches to the QEMU and Linux lists:

Be as careful as you can, properly test your code, review your own
commit messages, run spell check on them, and try to look at what you
send out through the eyes of someone who has never seen this code
before.

Review resources are really scarse in these projects and almost all
maintainers are overloaded, so we all have to work together and make
sure we don't flood the lists unwarranted.

It is good to re-spin quickly so people don't loose context, but if you
are fixing things up in a matter of hours and sending out new revisions,
you are moving too fast.

Turn-around time is days/weeks in the best cases, so better to take a
few extra days to thoroughly test a patch series.  The exception is to
illustrate a general idea or design, in which case clearly marking the
series as an RFC and noting what people should look at and what people
should ignore is key.

Hope this makes sense,
-Christoffer

Re: [Qemu-devel] should KVM or userspace be the one which decides what MIPIDR/affinity values to assign to vcpus?

2015-06-25 Thread Christoffer Dall

On Thu, Jun 25, 2015 at 10:06:20AM +0100, Peter Maydell wrote:
 On 25 June 2015 at 09:00, Christoffer Dall christoffer.d...@linaro.org 
 wrote:
  Of course, KVM can deny an unsupported configuration, but I am wondering
  if we really think anybody will care about the 'model such specific
  hardware' aspect with KVM, or if we should only consider the 'I want a
  VM with x VCPUs' scenario, in which case the second option below seems
  simpler to me.
 
 I agree it's not very likely anybody cares about the specific cluster
 topology. However if we don't want to support arbitrary topologies
 then QEMU is going to end up in the business of editing the user
 supplied device tree blob to make its cpu definitions match up with
 whatever the kernel provides, which could be pretty tedious.
 
I see, then you can't easily contruct a machine and a DT in one go
before talking to KVM.

Oh well, I don't feel strongly one way or the other.

-Christoffer

Re: [Qemu-devel] should KVM or userspace be the one which decides what MIPIDR/affinity values to assign to vcpus?

2015-06-25 Thread Christoffer Dall

Hi,

[sorry for reviving this thread late]

On Tue, Jun 09, 2015 at 12:24:13PM +0100, Peter Maydell wrote:
 On 9 June 2015 at 11:52, Marc Zyngier marc.zyng...@arm.com wrote:
  On 08/06/15 11:52, Peter Maydell wrote:
  On 8 June 2015 at 11:32, Igor Mammedov imamm...@redhat.com wrote:
  On Thu, 4 Jun 2015 18:17:39 +0100
  Peter Maydell peter.mayd...@linaro.org wrote:
  On 4 June 2015 at 17:40, Shlomo Pongratz shlomopongr...@gmail.com 
  wrote:
   In order for it to work correctly we must use MPIDR values in
   the device tree which match the MPIDR values the kernel has picked
   for the vcpus, so ask KVM what those values are.
 
  Could we set QEMU's generated mpidr in kernel instead of pulling it from 
  kernel,
  like we do with APIC ID in x86 and fix kernel not to reset it its own 
  value
  (i.e. untie mpidr from vcpuid)?
 
  Then later we could move setting mpidr from kvm_arch_init_vcpu() into
  board code which should be setting it, since it knows/defines what 
  topology it has.
 
  This is a question better asked on the kvmarm list (which I have cc'd),
  because that is where the kernel folks hang out...
 
  Care to provide some context? Why is this required?
 
 The device tree spec says we have to tell the guest kernel the
 affinity information for each CPU, in the reg field of the CPU node.
 This device tree blob is created by userspace; the MPIDR the guest
 actually reads is currently created by the kernel. The kernel's
 vgic v3 code may also make assumptions about vcpu_id = affinity
 mappings, I'm not sure.
 
 In current code:
  * the kernel has an opinion about vcpu_id = MPIDR mappings
  * QEMU does too, but it's not the same!
(luckily for 8 or fewer CPUs they agree, but this will not
do for GICv3 support)
 
 So either:
  * QEMU needs to tell the kernel the MPIDR for each vCPU

Does this option work for all configurations?  What if you're running on
a hardware with two clusters [8, 8] and QEMU asks KVM for a [12, 2]
configuration, for example.  Will it not be impossible for KVM to
maintain correct semantics for coherency guarantees across what seems to
the guest to be the same cluster, but really is two different clusters?

Of course, KVM can deny an unsupported configuration, but I am wondering
if we really think anybody will care about the 'model such specific
hardware' aspect with KVM, or if we should only consider the 'I want a
VM with x VCPUs' scenario, in which case the second option below seems
simpler to me.

Are there any concerns to think about wrt. migration here as well?

Thanks,
-Christoffer

  * QEMU needs to ask the kernel the MPIDR for each vCPU
 
 Which is better? The latter is simpler and will work with
 existing kernels. The former would let us (for instance)
 use KVM when we're modelling (real world) boards which have
 particular cluster configurations (which might not match the
 kernel's current simplistic always 16 CPUs at Aff0 setup).
 
 thanks
 -- PMM
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[Qemu-devel] [PATCH v4 0/4] Add support for for GICv2m and MSIs to arm-virt

2015-05-29 Thread Christoffer Dall

Now when we have a host generic PCIe controller in the virt board, it
would be nice to be able to use MSIs so that we can eventually enable
VHOST with KVM.

With these patches you can use MSIs with TCG and with KVM, but you still
need some fixes for the mapping of the IRQ index to the GSI number for
IRQFD to work.  A separate series that enables IRQFD and vhost
is available: ARM adaptations for vhost irqfd setup
https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01054.html)

Tested with KVM on XGene and with TCG by configuring a virtio-pci
network adapter for the guest and verifying MSIs going through as
expected.

Rebased on target-arm.next, see the individual patches for detailed
changelogs.

Christoffer Dall (4):
  target-arm: Add GIC phandle to VirtBoardInfo
  arm_gicv2m: Add GICv2m widget to support MSIs
  target-arm: Extend the gic node properties
  target-arm: Add the GICv2m to the virt board

 hw/arm/virt.c |  73 ++-
 hw/intc/Makefile.objs |   1 +
 hw/intc/arm_gicv2m.c  | 190 ++
 include/hw/arm/virt.h |   2 +
 4 files changed, 248 insertions(+), 18 deletions(-)
 create mode 100644 hw/intc/arm_gicv2m.c

-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v4 3/4] target-arm: Extend the gic node properties

2015-05-29 Thread Christoffer Dall

In preparation for adding the GICv2m which requires address specifiers
and is a subnode of the gic, we extend the gic DT definition to specify
the #address-cells and #size-cells properties and add an empty ranges
property properties of the DT node, since this is required to add the
v2m node as a child of the gic node.

Note that we must also expand the irq-map to reference the gic with the
right address-cells as a consequence of this change.

Reviewed-by: Eric Auger eric.au...@linaro.org
Suggested-by: Shanker Donthineni shank...@codeaurora.org
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v3:
 - Rewrote patch and changed authorship and tags accordingly
 - Fixed spelling in commit message
Changes since v2:
 - New separate patch factoring out changes to existing code for eased
   bisectability in case we broke something
 - The above fixes the issue with non-MSI compatible guests.

 hw/arm/virt.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e5235ef..387dac8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -317,6 +317,9 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi)
  2, vbi-memmap[VIRT_GIC_DIST].size,
  2, vbi-memmap[VIRT_GIC_CPU].base,
  2, vbi-memmap[VIRT_GIC_CPU].size);
+qemu_fdt_setprop_cell(vbi-fdt, /intc, #address-cells, 0x2);
+qemu_fdt_setprop_cell(vbi-fdt, /intc, #size-cells, 0x2);
+qemu_fdt_setprop(vbi-fdt, /intc, ranges, NULL, 0);
 qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle);
 }
 
@@ -585,7 +588,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
uint32_t gic_phandle,
 int first_irq, const char *nodename)
 {
 int devfn, pin;
-uint32_t full_irq_map[4 * 4 * 8] = { 0 };
+uint32_t full_irq_map[4 * 4 * 10] = { 0 };
 uint32_t *irq_map = full_irq_map;
 
 for (devfn = 0; devfn = 0x18; devfn += 0x8) {
@@ -598,13 +601,13 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
uint32_t gic_phandle,
 uint32_t map[] = {
 devfn  8, 0, 0,   /* devfn */
 pin + 1,/* PCI pin */
-gic_phandle, irq_type, irq_nr, irq_level }; /* GIC irq */
+gic_phandle, 0, 0, irq_type, irq_nr, irq_level }; /* GIC irq */
 
 /* Convert map to big endian */
-for (i = 0; i  8; i++) {
+for (i = 0; i  10; i++) {
 irq_map[i] = cpu_to_be32(map[i]);
 }
-irq_map += 8;
+irq_map += 10;
 }
 }
 
-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v4 1/4] target-arm: Add GIC phandle to VirtBoardInfo

2015-05-29 Thread Christoffer Dall

Instead of passing the GIC phandle around between functions, add it to
the VirtBoardInfo just like we do for the clock_phandle.  We are about
to add the v2m phandle as well, and it's easier not having to pass
around a bunch of phandles, return multiple values from functions, etc.

Reviewed-by: Eric Auger eric.au...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v3:
 - Added reviewed-by tag
Changes since v2:
 - None
Changes since v1:
 - Added reviewed-by tag

 hw/arm/virt.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 05db8cb..e5235ef 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -69,6 +69,7 @@ typedef struct VirtBoardInfo {
 void *fdt;
 int fdt_size;
 uint32_t clock_phandle;
+uint32_t gic_phandle;
 } VirtBoardInfo;
 
 typedef struct {
@@ -299,12 +300,11 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 }
 }
 
-static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi)
+static void fdt_add_gic_node(VirtBoardInfo *vbi)
 {
-uint32_t gic_phandle;
 
-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
-qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, gic_phandle);
+vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
+qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle);
 
 qemu_fdt_add_subnode(vbi-fdt, /intc);
 /* 'cortex-a15-gic' means 'GIC v2' */
@@ -317,12 +317,10 @@ static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi)
  2, vbi-memmap[VIRT_GIC_DIST].size,
  2, vbi-memmap[VIRT_GIC_CPU].base,
  2, vbi-memmap[VIRT_GIC_CPU].size);
-qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, gic_phandle);
-
-return gic_phandle;
+qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle);
 }
 
-static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic)
+static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 /* We create a standalone GIC v2 */
 DeviceState *gicdev;
@@ -371,7 +369,7 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, 
qemu_irq *pic)
 pic[i] = qdev_get_gpio_in(gicdev, i);
 }
 
-return fdt_add_gic_node(vbi);
+fdt_add_gic_node(vbi);
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
@@ -618,8 +616,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
uint32_t gic_phandle,
0x7   /* PCI irq */);
 }
 
-static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
-uint32_t gic_phandle)
+static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic)
 {
 hwaddr base_mmio = vbi-memmap[VIRT_PCIE_MMIO].base;
 hwaddr size_mmio = vbi-memmap[VIRT_PCIE_MMIO].size;
@@ -685,7 +682,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq 
*pic,
  2, base_mmio, 2, size_mmio);
 
 qemu_fdt_setprop_cell(vbi-fdt, nodename, #interrupt-cells, 1);
-create_pcie_irq_map(vbi, gic_phandle, irq, nodename);
+create_pcie_irq_map(vbi, vbi-gic_phandle, irq, nodename);
 
 g_free(nodename);
 }
@@ -717,7 +714,6 @@ static void machvirt_init(MachineState *machine)
 VirtBoardInfo *vbi;
 VirtGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state);
 VirtGuestInfo *guest_info = guest_info_state-info;
-uint32_t gic_phandle;
 char **cpustr;
 
 if (!cpu_model) {
@@ -794,13 +790,13 @@ static void machvirt_init(MachineState *machine)
 
 create_flash(vbi);
 
-gic_phandle = create_gic(vbi, pic);
+create_gic(vbi, pic);
 
 create_uart(vbi, pic);
 
 create_rtc(vbi, pic);
 
-create_pcie(vbi, pic, gic_phandle);
+create_pcie(vbi, pic);
 
 /* Create mmio transports, so the user can create virtio backends
  * (which will be automatically plugged in to the transports). If
-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v4 4/4] target-arm: Add the GICv2m to the virt board

2015-05-29 Thread Christoffer Dall

Add a GICv2m device to the virt board to enable MSIs on the generic PCI
host controller.  We allocate 64 SPIs in the IRQ space for now (this can
be increased/decreased later) and map the GICv2m right after the GIC in
the memory map.

Reviewed-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v3:
 - Rebased on target-arm.next, so moved some definitions to virt.h
 - Added reviewed-by tag
Changes since v2:
 - Factored out changes to GIC DT node to previous patch.
 - Renamed QOM type name to arm-gicv2m
Changes since v1:
 - Remove stray merge conflict line
 - Reworded commmit message.

 hw/arm/virt.c | 40 +++-
 include/hw/arm/virt.h |  2 ++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 387dac8..4bb7175 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -70,6 +70,7 @@ typedef struct VirtBoardInfo {
 int fdt_size;
 uint32_t clock_phandle;
 uint32_t gic_phandle;
+uint32_t v2m_phandle;
 } VirtBoardInfo;
 
 typedef struct {
@@ -109,6 +110,7 @@ static const MemMapEntry a15memmap[] = {
 /* GIC distributor and CPU interfaces sit inside the CPU peripheral space 
*/
 [VIRT_GIC_DIST] =   { 0x0800, 0x0001 },
 [VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
+[VIRT_GIC_V2M] ={ 0x0802, 0x1000 },
 [VIRT_UART] =   { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
 [VIRT_FW_CFG] = { 0x0902, 0x000a },
@@ -125,6 +127,7 @@ static const int a15irqmap[] = {
 [VIRT_RTC] = 2,
 [VIRT_PCIE] = 3, /* ... to 6 */
 [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
+[VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
 };
 
 static VirtBoardInfo machines[] = {
@@ -300,9 +303,21 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 }
 }
 
-static void fdt_add_gic_node(VirtBoardInfo *vbi)
+static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 {
+vbi-v2m_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
+qemu_fdt_add_subnode(vbi-fdt, /intc/v2m);
+qemu_fdt_setprop_string(vbi-fdt, /intc/v2m, compatible,
+arm,gic-v2m-frame);
+qemu_fdt_setprop(vbi-fdt, /intc/v2m, msi-controller, NULL, 0);
+qemu_fdt_setprop_sized_cells(vbi-fdt, /intc/v2m, reg,
+ 2, vbi-memmap[VIRT_GIC_V2M].base,
+ 2, vbi-memmap[VIRT_GIC_V2M].size);
+qemu_fdt_setprop_cell(vbi-fdt, /intc/v2m, phandle, vbi-v2m_phandle);
+}
 
+static void fdt_add_gic_node(VirtBoardInfo *vbi)
+{
 vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
 qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle);
 
@@ -323,6 +338,25 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi)
 qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle);
 }
 
+static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic)
+{
+int i;
+int irq = vbi-irqmap[VIRT_GIC_V2M];
+DeviceState *dev;
+
+dev = qdev_create(NULL, arm-gicv2m);
+sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vbi-memmap[VIRT_GIC_V2M].base);
+qdev_prop_set_uint32(dev, base-spi, irq);
+qdev_prop_set_uint32(dev, num-spi, NUM_GICV2M_SPIS);
+qdev_init_nofail(dev);
+
+for (i = 0; i  NUM_GICV2M_SPIS; i++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
+}
+
+fdt_add_v2m_gic_node(vbi);
+}
+
 static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 /* We create a standalone GIC v2 */
@@ -373,6 +407,8 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
 }
 
 fdt_add_gic_node(vbi);
+
+create_v2m(vbi, pic);
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
@@ -676,6 +712,8 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq 
*pic)
 qemu_fdt_setprop_cells(vbi-fdt, nodename, bus-range, 0,
nr_pcie_buses - 1);
 
+qemu_fdt_setprop_cells(vbi-fdt, nodename, msi-parent, vbi-v2m_phandle);
+
 qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, reg,
  2, base_ecam, 2, size_ecam);
 qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, ranges,
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ceec8b3..003ef29 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -32,6 +32,7 @@
 
 #include qemu-common.h
 
+#define NUM_GICV2M_SPIS   64
 #define NUM_VIRTIO_TRANSPORTS 32
 
 #define ARCH_TIMER_VIRT_IRQ   11
@@ -53,6 +54,7 @@ enum {
 VIRT_PCIE_MMIO,
 VIRT_PCIE_PIO,
 VIRT_PCIE_ECAM,
+VIRT_GIC_V2M,
 };
 
 typedef struct MemMapEntry {
-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v4 2/4] arm_gicv2m: Add GICv2m widget to support MSIs

2015-05-29 Thread Christoffer Dall

The ARM GICv2m widget is a little device that handles MSI interrupt
writes to a trigger register and ties them to a range of interrupt lines
wires to the GIC.  It has a few status/id registers and the interrupt wires,
and that's about it.

A board instantiates the device by setting the base SPI number and
number SPIs for the frame.  The base-spi parameter is indexed in the SPI
number space only, so base-spi == 0, means IRQ number 32.  When a device
(the PCI host controller) writes to the trigger register, the payload is
the GIC IRQ number, so we have to subtract 32 from that and then index
into our frame of SPIs.

When instantiating a GICv2m device, tell PCI that we have instantiated
something that can deal with MSIs.  We rely on the board actually wiring
up the GICv2m to the PCI host controller.

Reviewed-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v3:
 - Added reviewed-by tag
Changes since v2:
 - Renamed QOM type to arm-gicv2m
Changes since v1:
 - Check that writes to MSI_SETSPI are within the lower boundary as well
 - Move gicv2m to common-obj in Makefile
 - Separate switch case and comment for impdef regs
 - Clearly document what is emulated
 - Allow 16 bit lower accesses to MSI_SETSPI regs
 - Fix commit grammar error
 - Remove stray pixman commit

 hw/intc/Makefile.objs |   1 +
 hw/intc/arm_gicv2m.c  | 190 ++
 2 files changed, 191 insertions(+)
 create mode 100644 hw/intc/arm_gicv2m.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 843864a..092d8a8 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -11,6 +11,7 @@ common-obj-$(CONFIG_SLAVIO) += slavio_intctl.o
 common-obj-$(CONFIG_IOAPIC) += ioapic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
diff --git a/hw/intc/arm_gicv2m.c b/hw/intc/arm_gicv2m.c
new file mode 100644
index 000..9f84f72
--- /dev/null
+++ b/hw/intc/arm_gicv2m.c
@@ -0,0 +1,190 @@
+/*
+ *  GICv2m extension for MSI/MSI-x support with a GICv2-based system
+ *
+ * Copyright (C) 2015 Linaro, All rights reserved.
+ *
+ * Author: Christoffer Dall christoffer.d...@linaro.org
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/.
+ */
+
+/* This file implements an emulated GICv2m widget as described in the ARM
+ * Server Base System Architecture (SBSA) specification Version 2.2
+ * (ARM-DEN-0029 v2.2) pages 35-39 without any optional implementation defined
+ * identification registers and with a single non-secure MSI register frame.
+ */
+
+#include hw/sysbus.h
+#include hw/pci/msi.h
+
+#define TYPE_ARM_GICV2M arm-gicv2m
+#define ARM_GICV2M(obj) OBJECT_CHECK(ARMGICv2mState, (obj), TYPE_ARM_GICV2M)
+
+#define GICV2M_NUM_SPI_MAX 128
+
+#define V2M_MSI_TYPER   0x008
+#define V2M_MSI_SETSPI_NS   0x040
+#define V2M_MSI_IIDR0xFCC
+#define V2M_IIDR0   0xFD0
+#define V2M_IIDR11  0xFFC
+
+#define PRODUCT_ID_QEMU 0x51 /* ASCII code Q */
+
+typedef struct ARMGICv2mState {
+SysBusDevice parent_obj;
+
+MemoryRegion iomem;
+qemu_irq spi[GICV2M_NUM_SPI_MAX];
+
+uint32_t base_spi;
+uint32_t num_spi;
+} ARMGICv2mState;
+
+static void gicv2m_set_irq(void *opaque, int irq)
+{
+ARMGICv2mState *s = (ARMGICv2mState *)opaque;
+
+qemu_irq_pulse(s-spi[irq]);
+}
+
+static uint64_t gicv2m_read(void *opaque, hwaddr offset,
+unsigned size)
+{
+ARMGICv2mState *s = (ARMGICv2mState *)opaque;
+uint32_t val;
+
+if (size != 4) {
+qemu_log_mask(LOG_GUEST_ERROR, gicv2m_read: bad size %u\n, size);
+return 0;
+}
+
+switch (offset) {
+case V2M_MSI_TYPER:
+val = (s-base_spi + 32)  16;
+val |= s-num_spi;
+return val;
+case V2M_MSI_IIDR:
+/* We don't have any valid implementor so we leave that field as zero
+ * and we return 0 in the arch revision as per the spec.
+ */
+return (PRODUCT_ID_QEMU  20);
+case V2M_IIDR0 ... V2M_IIDR11:
+/* We do not implement any optional identification registers and the
+ * mandatory MSI_PIDR2 register reads as 0x0, so we

Re: [Qemu-devel] [PATCH v3 4/4] target-arm: Add the GICv2m to the virt board

2015-05-25 Thread Christoffer Dall

Hi Pavel,

On Mon, May 25, 2015 at 04:09:58PM +0300, Pavel Fedin wrote:
  Hello!
 
   typedef struct MemMapEntry {
  @@ -88,6 +90,7 @@ typedef struct VirtBoardInfo {
   int fdt_size;
   uint32_t clock_phandle;
   uint32_t gic_phandle;
  +uint32_t v2m_phandle;
   } VirtBoardInfo;
 
  Could you rename v2m_phandle to something more neutral like msi_phandle ? It 
 will also be
 used by GICv3 ITS implementation.
 
That's sort of how to speculate about.  Why can't those patches just
rename the variable then?  Right now, as the code stands, msi_phandle
would be less clear IMHO.

-Christoffer

[Qemu-devel] [PATCH v3 1/4] target-arm: Add GIC phandle to VirtBoardInfo

2015-05-24 Thread Christoffer Dall

Instead of passing the GIC phandle around between functions, add it to
the VirtBoardInfo just like we do for the clock_phandle.  We are about
to add the v2m phandle as well, and it's easier not having to pass
around a bunch of phandles, return multiple values from functions, etc.

Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v2:
 - None
Changes since v1:
 - Added reviewed-by tag

 hw/arm/virt.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a7f9a10..f9f7482 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -87,6 +87,7 @@ typedef struct VirtBoardInfo {
 void *fdt;
 int fdt_size;
 uint32_t clock_phandle;
+uint32_t gic_phandle;
 } VirtBoardInfo;
 
 typedef struct {
@@ -322,12 +323,11 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 }
 }
 
-static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi)
+static void fdt_add_gic_node(VirtBoardInfo *vbi)
 {
-uint32_t gic_phandle;
 
-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
-qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, gic_phandle);
+vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
+qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle);
 
 qemu_fdt_add_subnode(vbi-fdt, /intc);
 /* 'cortex-a15-gic' means 'GIC v2' */
@@ -340,12 +340,10 @@ static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi)
  2, vbi-memmap[VIRT_GIC_DIST].size,
  2, vbi-memmap[VIRT_GIC_CPU].base,
  2, vbi-memmap[VIRT_GIC_CPU].size);
-qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, gic_phandle);
-
-return gic_phandle;
+qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle);
 }
 
-static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic)
+static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 /* We create a standalone GIC v2 */
 DeviceState *gicdev;
@@ -394,7 +392,7 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, 
qemu_irq *pic)
 pic[i] = qdev_get_gpio_in(gicdev, i);
 }
 
-return fdt_add_gic_node(vbi);
+fdt_add_gic_node(vbi);
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
@@ -641,8 +639,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
uint32_t gic_phandle,
0x7   /* PCI irq */);
 }
 
-static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
-uint32_t gic_phandle)
+static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic)
 {
 hwaddr base = vbi-memmap[VIRT_PCIE].base;
 hwaddr size = vbi-memmap[VIRT_PCIE].size;
@@ -714,7 +711,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq 
*pic,
  2, base_mmio, 2, size_mmio);
 
 qemu_fdt_setprop_cell(vbi-fdt, nodename, #interrupt-cells, 1);
-create_pcie_irq_map(vbi, gic_phandle, irq, nodename);
+create_pcie_irq_map(vbi, vbi-gic_phandle, irq, nodename);
 
 g_free(nodename);
 }
@@ -736,7 +733,6 @@ static void machvirt_init(MachineState *machine)
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 const char *cpu_model = machine-cpu_model;
 VirtBoardInfo *vbi;
-uint32_t gic_phandle;
 char **cpustr;
 
 if (!cpu_model) {
@@ -813,13 +809,13 @@ static void machvirt_init(MachineState *machine)
 
 create_flash(vbi);
 
-gic_phandle = create_gic(vbi, pic);
+create_gic(vbi, pic);
 
 create_uart(vbi, pic);
 
 create_rtc(vbi, pic);
 
-create_pcie(vbi, pic, gic_phandle);
+create_pcie(vbi, pic);
 
 /* Create mmio transports, so the user can create virtio backends
  * (which will be automatically plugged in to the transports). If
-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v3 0/4] Add support for for GICv2m and MSIs to arm-virt

2015-05-24 Thread Christoffer Dall

Now when we have a host generic PCIe controller in the virt board, it
would be nice to be able to use MSIs so that we can eventually enable
VHOST with KVM.

With these patches you can use MSIs with TCG and with KVM, but you still
need some fixes for the mapping of the IRQ index to the GSI number for
IRQFD to work.  A separate series that enables IRQFD and vhost
is available: ARM adaptations for vhost irqfd setup
https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01054.html)

Tested with KVM on XGene and with TCG by configuring a virtio-pci
network adapter for the guest and verifying MSIs going through as
expected.

See the individual patches for changelogs.

Christoffer Dall (3):
  target-arm: Add GIC phandle to VirtBoardInfo
  arm_gicv2m: Add GICv2m widget to support MSIs
  target-arm: Add the GICv2m to the virt board

Shanker Donthineni (1):
  target-arm: Extend the gic node properties

 hw/arm/virt.c |  81 +++--
 hw/intc/Makefile.objs |   1 +
 hw/intc/arm_gicv2m.c  | 190 ++
 3 files changed, 252 insertions(+), 20 deletions(-)
 create mode 100644 hw/intc/arm_gicv2m.c

-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v3 3/4] target-arm: Extend the gic node properties

2015-05-24 Thread Christoffer Dall

From: Shanker Donthineni shank...@codeaurora.org

In preparation for adding the GICv2m which requires address specifiers
and is a subnode of the gic, we extend the gic DT definition to specify
the #address-cells and #size-cells properties and add an empty ranges
property properties of the DT node, since this is required to add the
v2m node as a child of the gic node.

Note that we must also expand the irq-map to reference the gic with the
right address-cells as a consequnce of this change.

Signed-off-by: Shanker Donthineni shank...@codeaurora.org
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v2:
 - New separate patch factoring out changes to existing code for eased
   bisectability in case we broke something
 - The above fixes the issue with non-MSI compatible guests.

 hw/arm/virt.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index f9f7482..6797c6f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -340,7 +340,11 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi)
  2, vbi-memmap[VIRT_GIC_DIST].size,
  2, vbi-memmap[VIRT_GIC_CPU].base,
  2, vbi-memmap[VIRT_GIC_CPU].size);
+qemu_fdt_setprop_cell(vbi-fdt, /intc, #address-cells, 0x2);
+qemu_fdt_setprop_cell(vbi-fdt, /intc, #size-cells, 0x2);
+qemu_fdt_setprop(vbi-fdt, /intc, ranges, NULL, 0);
 qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle);
+
 }
 
 static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
@@ -604,11 +608,12 @@ static void create_fw_cfg(const VirtBoardInfo *vbi)
 g_free(nodename);
 }
 
+#define PCIE_IRQMAP_LEN 10
 static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle,
 int first_irq, const char *nodename)
 {
 int devfn, pin;
-uint32_t full_irq_map[4 * 4 * 8] = { 0 };
+uint32_t full_irq_map[4 * 4 * PCIE_IRQMAP_LEN] = { 0 };
 uint32_t *irq_map = full_irq_map;
 
 for (devfn = 0; devfn = 0x18; devfn += 0x8) {
@@ -619,15 +624,15 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
uint32_t gic_phandle,
 int i;
 
 uint32_t map[] = {
-devfn  8, 0, 0,   /* devfn */
-pin + 1,/* PCI pin */
-gic_phandle, irq_type, irq_nr, irq_level }; /* GIC irq */
+devfn  8, 0, 0, /* devfn */
+pin + 1,  /* PCI pin */
+gic_phandle, 0, 0, irq_type, irq_nr, irq_level }; /* GIC irq */
 
 /* Convert map to big endian */
-for (i = 0; i  8; i++) {
+for (i = 0; i  PCIE_IRQMAP_LEN; i++) {
 irq_map[i] = cpu_to_be32(map[i]);
 }
-irq_map += 8;
+irq_map += PCIE_IRQMAP_LEN;
 }
 }
 
-- 
2.1.2.330.g565301e.dirty

[Qemu-devel] [PATCH v3 2/4] arm_gicv2m: Add GICv2m widget to support MSIs

2015-05-24 Thread Christoffer Dall

The ARM GICv2m widget is a little device that handles MSI interrupt
writes to a trigger register and ties them to a range of interrupt lines
wires to the GIC.  It has a few status/id registers and the interrupt wires,
and that's about it.

A board instantiates the device by setting the base SPI number and
number SPIs for the frame.  The base-spi parameter is indexed in the SPI
number space only, so base-spi == 0, means IRQ number 32.  When a device
(the PCI host controller) writes to the trigger register, the payload is
the GIC IRQ number, so we have to subtract 32 from that and then index
into our frame of SPIs.

When instantiating a GICv2m device, tell PCI that we have instantiated
something that can deal with MSIs.  We rely on the board actually wiring
up the GICv2m to the PCI host controller.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v2:
 - Renamed QOM type to arm-gicv2m
Changes since v1:
 - Check that writes to MSI_SETSPI are within the lower boundary as well
 - Move gicv2m to common-obj in Makefile
 - Separate switch case and comment for impdef regs
 - Clearly document what is emulated
 - Allow 16 bit lower accesses to MSI_SETSPI regs
 - Fix commit grammar error
 - Remove stray pixman commit

 hw/intc/Makefile.objs |   1 +
 hw/intc/arm_gicv2m.c  | 190 ++
 2 files changed, 191 insertions(+)
 create mode 100644 hw/intc/arm_gicv2m.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 843864a..092d8a8 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -11,6 +11,7 @@ common-obj-$(CONFIG_SLAVIO) += slavio_intctl.o
 common-obj-$(CONFIG_IOAPIC) += ioapic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
diff --git a/hw/intc/arm_gicv2m.c b/hw/intc/arm_gicv2m.c
new file mode 100644
index 000..9f84f72
--- /dev/null
+++ b/hw/intc/arm_gicv2m.c
@@ -0,0 +1,190 @@
+/*
+ *  GICv2m extension for MSI/MSI-x support with a GICv2-based system
+ *
+ * Copyright (C) 2015 Linaro, All rights reserved.
+ *
+ * Author: Christoffer Dall christoffer.d...@linaro.org
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/.
+ */
+
+/* This file implements an emulated GICv2m widget as described in the ARM
+ * Server Base System Architecture (SBSA) specification Version 2.2
+ * (ARM-DEN-0029 v2.2) pages 35-39 without any optional implementation defined
+ * identification registers and with a single non-secure MSI register frame.
+ */
+
+#include hw/sysbus.h
+#include hw/pci/msi.h
+
+#define TYPE_ARM_GICV2M arm-gicv2m
+#define ARM_GICV2M(obj) OBJECT_CHECK(ARMGICv2mState, (obj), TYPE_ARM_GICV2M)
+
+#define GICV2M_NUM_SPI_MAX 128
+
+#define V2M_MSI_TYPER   0x008
+#define V2M_MSI_SETSPI_NS   0x040
+#define V2M_MSI_IIDR0xFCC
+#define V2M_IIDR0   0xFD0
+#define V2M_IIDR11  0xFFC
+
+#define PRODUCT_ID_QEMU 0x51 /* ASCII code Q */
+
+typedef struct ARMGICv2mState {
+SysBusDevice parent_obj;
+
+MemoryRegion iomem;
+qemu_irq spi[GICV2M_NUM_SPI_MAX];
+
+uint32_t base_spi;
+uint32_t num_spi;
+} ARMGICv2mState;
+
+static void gicv2m_set_irq(void *opaque, int irq)
+{
+ARMGICv2mState *s = (ARMGICv2mState *)opaque;
+
+qemu_irq_pulse(s-spi[irq]);
+}
+
+static uint64_t gicv2m_read(void *opaque, hwaddr offset,
+unsigned size)
+{
+ARMGICv2mState *s = (ARMGICv2mState *)opaque;
+uint32_t val;
+
+if (size != 4) {
+qemu_log_mask(LOG_GUEST_ERROR, gicv2m_read: bad size %u\n, size);
+return 0;
+}
+
+switch (offset) {
+case V2M_MSI_TYPER:
+val = (s-base_spi + 32)  16;
+val |= s-num_spi;
+return val;
+case V2M_MSI_IIDR:
+/* We don't have any valid implementor so we leave that field as zero
+ * and we return 0 in the arch revision as per the spec.
+ */
+return (PRODUCT_ID_QEMU  20);
+case V2M_IIDR0 ... V2M_IIDR11:
+/* We do not implement any optional identification registers and the
+ * mandatory MSI_PIDR2 register reads as 0x0, so we capture all
+ * implementation defined registers here.
+ */
+return

[Qemu-devel] [PATCH v3 4/4] target-arm: Add the GICv2m to the virt board

2015-05-24 Thread Christoffer Dall

Add a GICv2m device to the virt board to enable MSIs on the generic PCI
host controller.  We allocate 64 SPIs in the IRQ space for now (this can
be increased/decreased later) and map the GICv2m right after the GIC in
the memory map.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changes since v2:
 - Factored out changes to GIC DT node to previous patch.
 - Renamed QOM type name to arm-gicv2m
Changes since v1:
 - Remove stray merge conflict line
 - Reworded commmit message.

 hw/arm/virt.c | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6797c6f..2972bb3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -45,6 +45,7 @@
 #include hw/pci-host/gpex.h
 
 #define NUM_VIRTIO_TRANSPORTS 32
+#define NUM_GICV2M_SPIS 64
 
 /* Number of external interrupt lines to configure the GIC with */
 #define NUM_IRQS 128
@@ -71,6 +72,7 @@ enum {
 VIRT_RTC,
 VIRT_FW_CFG,
 VIRT_PCIE,
+VIRT_GIC_V2M,
 };
 
 typedef struct MemMapEntry {
@@ -88,6 +90,7 @@ typedef struct VirtBoardInfo {
 int fdt_size;
 uint32_t clock_phandle;
 uint32_t gic_phandle;
+uint32_t v2m_phandle;
 } VirtBoardInfo;
 
 typedef struct {
@@ -127,6 +130,7 @@ static const MemMapEntry a15memmap[] = {
 /* GIC distributor and CPU interfaces sit inside the CPU peripheral space 
*/
 [VIRT_GIC_DIST] =   { 0x0800, 0x0001 },
 [VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
+[VIRT_GIC_V2M] ={ 0x0802, 0x1000 },
 [VIRT_UART] =   { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
 [VIRT_FW_CFG] = { 0x0902, 0x000a },
@@ -148,6 +152,7 @@ static const int a15irqmap[] = {
 [VIRT_RTC] = 2,
 [VIRT_PCIE] = 3, /* ... to 6 */
 [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
+[VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
 };
 
 static VirtBoardInfo machines[] = {
@@ -323,9 +328,21 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 }
 }
 
-static void fdt_add_gic_node(VirtBoardInfo *vbi)
+static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 {
+vbi-v2m_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
+qemu_fdt_add_subnode(vbi-fdt, /intc/v2m);
+qemu_fdt_setprop_string(vbi-fdt, /intc/v2m, compatible,
+arm,gic-v2m-frame);
+qemu_fdt_setprop(vbi-fdt, /intc/v2m, msi-controller, NULL, 0);
+qemu_fdt_setprop_sized_cells(vbi-fdt, /intc/v2m, reg,
+ 2, vbi-memmap[VIRT_GIC_V2M].base,
+ 2, vbi-memmap[VIRT_GIC_V2M].size);
+qemu_fdt_setprop_cell(vbi-fdt, /intc/v2m, phandle, vbi-v2m_phandle);
+}
 
+static void fdt_add_gic_node(VirtBoardInfo *vbi)
+{
 vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt);
 qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle);
 
@@ -347,6 +364,25 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi)
 
 }
 
+static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic)
+{
+int i;
+int irq = vbi-irqmap[VIRT_GIC_V2M];
+DeviceState *dev;
+
+dev = qdev_create(NULL, arm-gicv2m);
+sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vbi-memmap[VIRT_GIC_V2M].base);
+qdev_prop_set_uint32(dev, base-spi, irq);
+qdev_prop_set_uint32(dev, num-spi, NUM_GICV2M_SPIS);
+qdev_init_nofail(dev);
+
+for (i = 0; i  NUM_GICV2M_SPIS; i++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
+}
+
+fdt_add_v2m_gic_node(vbi);
+}
+
 static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
 {
 /* We create a standalone GIC v2 */
@@ -397,6 +433,8 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic)
 }
 
 fdt_add_gic_node(vbi);
+
+create_v2m(vbi, pic);
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
@@ -707,6 +745,8 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq 
*pic)
 qemu_fdt_setprop_cells(vbi-fdt, nodename, bus-range, 0,
nr_pcie_buses - 1);
 
+qemu_fdt_setprop_cells(vbi-fdt, nodename, msi-parent, vbi-v2m_phandle);
+
 qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, reg,
  2, base_ecam, 2, size_ecam);
 qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, ranges,
-- 
2.1.2.330.g565301e.dirty

Re: [Qemu-devel] [PATCH v2 0/3] Add support for for GICv2m and MSIs to arm-virt

2015-05-24 Thread Christoffer Dall

On Wed, May 06, 2015 at 05:39:28PM +0100, Peter Maydell wrote:
 On 6 May 2015 at 17:33, Peter Maydell peter.mayd...@linaro.org wrote:
  On 27 April 2015 at 18:31, Christoffer Dall christoffer.d...@linaro.org 
  wrote:
  Now when we have a host generic PCIe controller in the virt board, it
  would be nice to be able to use MSIs so that we can eventually enable
  VHOST with KVM.
 
  With these patches you can use MSIs with TCG and with KVM, but you still
  need some fixes for the mapping of the IRQ index to the GSI number for
  IRQFD to work.  A separate series that enables IRQFD and vhost
  is available: ARM adaptations for vhost irqfd setup
  https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01054.html)
 
  Tested with KVM on XGene and with TCG by configuring a virtio-pci
  network adapter for the guest and verifying MSIs going through as
  expected.
 
  You forgot to change the QOM device name to 'arm-gicv2m', but
  I'll fix that up as I apply this to target-arm.next.
 
 ...except this series breaks booting of a linux guest using PCI
 on the virt board with aarch32:
 
 PCI host bridge /pcie@1000 ranges:
IO 0x3eff..0x3eff - 0x
   MEM 0x1000..0x3efe - 0x1000
 pci-host-generic 3f00.pcie: PCI host bridge to bus :00
 pci_bus :00: root bus resource [bus 00-0f]
 pci_bus :00: root bus resource [io  0x-0x]
 pci_bus :00: root bus resource [mem 0x1000-0x3efe]
 PCI: bus0: Fast back to back transfers disabled
 pci :00:01.0: of_irq_parse_pci() failed with rc=-22
 pci :00:02.0: of_irq_parse_pci() failed with rc=-22
 pci :00:02.0: BAR 6: assigned [mem 0x1000-0x1003 pref]
 pci :00:01.0: BAR 1: assigned [mem 0x1004-0x10040fff]
 pci :00:02.0: BAR 1: assigned [mem 0x10041000-0x10041fff]
 pci :00:01.0: BAR 0: assigned [io  0x1000-0x103f]
 pci :00:02.0: BAR 0: assigned [io  0x1040-0x105f]
 virtio-pci :00:01.0: enabling device (0100 - 0103)
 virtio-pci :00:02.0: enabling device (0100 - 0103)
 virtio_blk: probe of virtio32 failed with error -22
 virtio_net: probe of virtio33 failed with error -22
 
 (and without virtio-blk we don't mount our rootfs).
 
Shanker figured out that this was due to me changing address-cells and
size-cells in the gic node and breaking the irq-map in the DT and
provided a fix.

I will send a new series.

Thanks,
-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-20 Thread Christoffer Dall

On Tue, May 19, 2015 at 12:18:54PM +0100, Catalin Marinas wrote:
 On Tue, May 19, 2015 at 11:03:22AM +0100, Andrew Jones wrote:
  On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote:
   Another way would be to split the vma containing the non-cacheable
   memory so that you get a single vma with the vm_page_prot as
   Non-cacheable.
  
  This sounds interesting. Actually, it even crossed my mind once when I
  first saw that the vma would overwrite the attributes, but then, sigh,
  I let my brain take a stupidity bath.
  
   
   Yet another approach could be for KVM to mmap the necessary memory for
   Qemu via a file_operations.mmap call (but that's only for ranges outside
   the guest RAM).
  
  I guess I prefer the vma splitting, rather than this (the vma creating
  with mmap), as it keeps the KVM interface from changing (as you point out
  below). Well, unless there are other advantages to this that are worth
  considering?
 
 The advantage is that you don't need to deal with the mm internals in
 the KVM code.
 
 But you can probably add such code directly to mm/ and reuse some of the
 existing code in there already as part of change_protection(),
 mprotect_fixup(), sys_mprotect(). Actually, once you split the vma and
 set the new protection (something similar to mprotect_fixup), it looks
 to me like you can just call change_protection(vma-vm_page_prot).
 
   I didn't have time to follow these threads in details, but just to
   recap my understanding, we have two main use-cases:
   
   1. Qemu handling guest I/O to device (e.g. PCIe BARs)
   2. Qemu emulating device DMA
   
   For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about
   this memory slot. The memory attributes in this case could be Device
   because that's how the guest would normally map it. The
   file_operations.mmap trick would work in this case but this means
   expanding the KVM ABI beyond just an ioctl().
   
   For (2), since Qemu is writing to the guest RAM (e.g. video
   framebuffer allocated by the guest), I still think the simplest is to
   tell the guest (via DT) that such device is cache coherent rather than
   trying to remap the Qemu mapping as non-cacheable.
  
  If we need a solution for (1), then I'd prefer that it work and be
  applied to (2) as well. Anyway, I'm still not 100% sure we can count on
  all guest types (booloaders, different OSes) to listen to us. They may
  assume non-cacheable is typical and safe, and thus just do that always.
  We can certainly change some of those bootloaders and OSes, but probably
  not all of them.
 
 That's fine by me. Once you get the vma splitting and attributes
 changing done, I think you get the second one for free.
 
 Do we want to differentiate between Device and Normal Non-cacheable
 memory? Something like KVM_MEMSLOT_DEVICE?
 
 Nitpick: I'm not sure whether uncached is clear enough. In Linux,
 pgprot_noncached() returns Strongly Ordered memory. For Normal
 Non-cachable we used pgprot_writecombine (e.g. a video framebuffer).
 
 Maybe something like KVM_MEMSLOT_COHERENT meaning a request to KVM to
 ensure that guest and host access it coherently (which would mean
 writecombine for ARM). That's similar naming to functions like
 dma_alloc_coherent() that return cacheable or non-cacheable memory based
 on what the device supports. Anyway, I'm not to bothered with the
 naming.
 
One thing to keep in mind for (2) is that QEMU is likely to do things
like calling regular memcpy() on the memory region, so mapping it as
device memory which would fault on unaligned accesses may be a problem,
so ideally there is a memory type for the user space mapping which
allows such behavior where we at the same time can guarantee the that
the mapping is coherent with the guest mapping through the S2
attributes.

-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-15 Thread Christoffer Dall

On Thu, May 14, 2015 at 03:36:37PM +0200, Andrew Jones wrote:
 On Thu, May 14, 2015 at 02:11:59PM +0100, Peter Maydell wrote:
  On 14 May 2015 at 14:03, Andrew Jones drjo...@redhat.com wrote:
   On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote:
   On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote:
Forgot to (4): switch from setting userspace's mapping to
device memory to normal, non-cacheable. Using device memory
caused a problem that Alex Graf found, and Peter Maydell suggested
using normal, non-cacheable instead.
  
   Did you check that non-cacheable is definitely the correct
   kind of Normal memory attribute we want? (ie not write-through).
  
   I was concerned that write-through wouldn't be sufficient. If the
   guest writes to its non-cached memory, and QEMU needs to see what
   it wrote, then won't write-through fail to work? Unless we some
   how invalidate the cache first?
  
  Well, I meant more that the correct mapping for userspace is
  the same as the guest, whatever that is, and so somebody needs
  to look at what the guest actually does rather than merely
  hoping NormalNC is OK. (For instance, do we need to provide
  support for QEMU to map both NC and writethrough?)
 
 
 Ah, we assume the guest is mapping it as device memory, and in
 this version of the series, I ensure that it is at least NC with
 the S2 attributes. I don't think we can look at what some guests
 do with some devices to come up with anything beyond (poor?)
 heuristics. I prefer that we force both the guest and QEMU to NC
 (or guest chooses Device and QEMU is forced to NC) to make sure
 we get it right.
 
But picking up on Peter's feedback I think it would be good if the
series clearly states something like:

1) We assume that the guest may use device type memory for the accesses
2) we cannot use device memory for the userspace mapping because
userspace may be doing unaligned accesses to it 3) normal non-cacheable
bridges these worlds becauase of x, y, and z.

I assume x, y, and z would include a fairly involved discussion of the
interesting aspects of how you can configure memory accesses on ARM ...
:)

-Christoffer

Re: [Qemu-devel] [RFC] ARM/ARM64: KVM: Implement KVM_FLUSH_DCACHE_GPA ioctl

2015-05-15 Thread Christoffer Dall

On Fri, May 15, 2015 at 01:43:57PM +0200, Laszlo Ersek wrote:
 On 05/07/15 19:01, Paolo Bonzini wrote:
  
  
  On 07/05/2015 18:56, Jérémy Fanguède wrote:
  USB devices fail with a timeout error, as if the communication between
  the kernel and the devices fail at a certain point:
  usb 1-1: device not accepting address 5, error -110
  usb usb1-port1: unable to enumerate USB device
 
 This is consistent with what I saw in my earlier testing.
 
  e1000 fails when the userspace tries to use it, with these type of
  kernel messages:
  e1000 :00:02.0 eth0: Detected Tx Unit Hang
Tx Queue 0
TDH  d
TDT  d
next_to_use  d
next_to_clean9
  buffer_info[next_to_clean]
time_stamp   9311
next_to_watcha
jiffies  956a
next_to_watch.status 0
  
  Can you find out what memory attributes the guest is using for the
  memory---and if it's uncached, why?
 
 For USB, see drivers/usb/core/hcd-pci.c, function usb_hcd_pci_probe():
 it uses ioremap_nocache().
 
 On the why, that ioremap_nocache() call can be tracked to
 
 http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=a914dd8b
 
 (Feb 2002), which predates the kernel's move to git. I guess
 ioremap_nocache() is used simply because USB host controllers are
 supposed to programmed like that.
 
 And, from arch/arm64/include/asm/io.h:
 
 #define ioremap_nocache(addr, size) __ioremap((addr), (size),
 __pgprot(PROT_DEVICE_nGnRE))
 
So this just means that these devices should be mapped as device memory
(like the VGA case before) right?  And therefore should work with Drew's
patches (assuming they are actually correct and you add the right QEMU
annotations to set the memory regions and non-cacheable), correct?

Thanks,
-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-15 Thread Christoffer Dall

On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
 On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
  On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
   When S1 and S2 memory attributes combine wrt to caching policy,
   non-cacheable types take precedence. If a guest maps a region as
   device memory, which KVM userspace is using to emulate the device
   using normal, cacheable memory, then we lose coherency. With
   KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
   regions are likely to be problematic. With this patch, as pages
   of these types of regions are faulted into the guest, not only do
   we flush the page's dcache, but we also change userspace's
   mapping to NC in order to maintain coherency.
   
   What if the guest doesn't do what we expect? While we can't
   force a guest to use cacheable memory, we can take advantage of
   the non-cacheable precedence, and force it to use non-cacheable.
   So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
   KVM_MEM_UNCACHED regions to force them to NC.
   
   We now have both guest and userspace on the same page (pun intended)
  
  I'd like to revisit the overall approach here.  Is doing non-cached
  accesses in both the guest and host really the right thing to do here?
 
 I think so, but all ideas/approaches are still on the table. This is
 still an RFC.
 
  
  The semantics of the device becomes that it is cache coherent (because
  QEMU is), and I think Marc argued that Linux/UEFI should simply be
  adapted to handle whatever emulated devices we have as coherent.  I also
  remember someone arguing that would be wrong (Peter?).
 
 I'm not really for quirking all devices in all guest types (AAVMF, Linux,
 other bootloaders, other OSes). Windows is unlikely to apply any quirks.
 

Well my point was that if we're emulating a platform with coherent IO
memory for PCI devices that is something that the guest should work with
as such, but as Paolo explained it should always be safe for a guest to
assume non-coherent, so that doesn't work.

  
  Finally, does this address all cache coherency issues with emulated
  devices?  Some VOS guys had seen things still not working with this
  approach, unsure why...  I'd like to avoid us merging this only to merge
  a more complete solution in a few weeks which reverts this solution...
 
 I'm not sure (this is still an RFT too :-) We definitely would need to
 scatter some more memory_region_set_uncached() calls around QEMU first.
 

It would be good if you could sync with the VOS guys and make sure your
patch set addresses their issues with the appropriate
memory_region_set_uncached() added to QEMU, and if it does not, some
vague idea why that falls outside of the scope of this patch set.  After
all, adding a USB controller to a VM is not that an esoteric use case,
is it?

  
  More comments/questions below:
  
   
   Signed-off-by: Andrew Jones drjo...@redhat.com
   ---
arch/arm/include/asm/kvm_mmu.h|  5 -
arch/arm/include/asm/pgtable-3level.h |  1 +
arch/arm/include/asm/pgtable.h|  1 +
arch/arm/kvm/mmu.c| 37 
   +++
arch/arm64/include/asm/kvm_mmu.h  |  5 -
arch/arm64/include/asm/memory.h   |  1 +
arch/arm64/include/asm/pgtable.h  |  1 +
7 files changed, 36 insertions(+), 15 deletions(-)
   
   diff --git a/arch/arm/include/asm/kvm_mmu.h 
   b/arch/arm/include/asm/kvm_mmu.h
   index 405aa18833073..e8034a80b12e5 100644
   --- a/arch/arm/include/asm/kvm_mmu.h
   +++ b/arch/arm/include/asm/kvm_mmu.h
   @@ -214,8 +214,11 @@ static inline void 
   __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
 while (size) {
 void *va = kmap_atomic_pfn(pfn);

   - if (need_flush)
   + if (need_flush) {
 kvm_flush_dcache_to_poc(va, PAGE_SIZE);
   + if (ipa_uncached)
   + set_memory_nc((unsigned long)va, 1);
  
  nit: consider moving this outside the need_flush
  
   + }

 if (icache_is_pipt())
 __cpuc_coherent_user_range((unsigned long)va,
   diff --git a/arch/arm/include/asm/pgtable-3level.h 
   b/arch/arm/include/asm/pgtable-3level.h
   index a745a2a53853c..39b3f7a40e663 100644
   --- a/arch/arm/include/asm/pgtable-3level.h
   +++ b/arch/arm/include/asm/pgtable-3level.h
   @@ -121,6 +121,7 @@
 * 2nd stage PTE definitions for LPAE.
 */
#define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0)  2) /* 
   strongly ordered */
   +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5)  2) /* 
   normal non-cacheable */
#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa)  2) /* normal 
   inner write-through */
#define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf)  2) /* 
   normal inner write-back */
#define L_PTE_S2_MT_DEV_SHARED   (_AT(pteval_t, 0x1

Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-15 Thread Christoffer Dall

On Thu, May 14, 2015 at 03:46:44PM +0200, Andrew Jones wrote:
 On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote:
  On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote:
   Provide a method to change normal, cacheable memory to non-cacheable.
   KVM will make use of this to keep emulated device memory regions
   coherent with the guest.
   
   Signed-off-by: Andrew Jones drjo...@redhat.com
  
  Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
  
  But you obviously need Russell and Will/Catalin to ack/merge this.
 
 I guess this patch is going to go away in the next round. You've
 pointed out that I screwed stuff up royally with my over eagerness
 to reuse code. I need to reimplement change_memory_common, but a
 version that takes an mm, which is more or less what I did in the
 last version of this series, back when I was pinning pages.
 
Yeah, I just read this one before looking at the others because it was a
simple one...

-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Christoffer Dall

On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 14:00, Christoffer Dall wrote:
  So, getting back to my original question.  Is the point then that UEFI
  must assume (from ACPI/DT) the cache-coherency properties of the PCI
  controller which exists in hardware on the system you're running on,
  even for the virtual PCI bus because that will be the semantics for
  assigned devices?
  
  And in that case, we have no way to distinguish between passthrough
  devices and virtual devices plugged into the virtual PCI bus?
 
 Well, we could use the subsystem id.  But it's a hack, and may cause
 incompatibilities with some drivers.  Michael, any ideas?
 
  What about the idea of having two virtual PCI buses on your system where
  one is always cache-coherent and uses for virtual devices, and the other
  is whatever the hardware is and used for passthrough devices?
 
 I think that was rejected before.
 

Do you remember where?  I just remember Catalin mentioning the idea to
me verbally.

Besides the slightly heavy added use of resources etc. it seems like it
would address some of our issues in a good way.

But I'm still not sure why UEFI/Linux currently sees our PCI bus as
being non-coherent when in fact it is and we have no passthrough issues
currently.  Are all PCI controllers always non-coherent for some reason
and therefore we model it as such too?

-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Christoffer Dall

On Thu, May 14, 2015 at 01:38:38PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 13:36, Christoffer Dall wrote:
 (It's probably worth looking at the documentation in the first hunk 
 too,
 under the commit message.)

Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
not simply that MMIO regions are coherent?
   
   Only until device assignment gets into the picture.
  
  Will UEFI have to deal with device assignment in any respect?
 
 Why not?  For example you could do network boot from an assigned network
 card.
 
 In fact, anything that UEFI has to deal with, the OS has to deal with
 too.  If you need a UEFI hack, chances are you need or will need a Linux
 hack too.
 
Fair enough.  I was thinking that UEFI needs to be built with knowledge
of all the hardware present including any passthrough devices, but I
guess this is plainly not true with PCI (and might not even be true with
the level of DT parsing we do for the virtual platform).

So, getting back to my original question.  Is the point then that UEFI
must assume (from ACPI/DT) the cache-coherency properties of the PCI
controller which exists in hardware on the system you're running on,
even for the virtual PCI bus because that will be the semantics for
assigned devices?

And in that case, we have no way to distinguish between passthrough
devices and virtual devices plugged into the virtual PCI bus?

What about the idea of having two virtual PCI buses on your system where
one is always cache-coherent and uses for virtual devices, and the other
is whatever the hardware is and used for passthrough devices?

-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Christoffer Dall

On Thu, May 14, 2015 at 02:28:49PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 14:24, Christoffer Dall wrote:
  On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote:
 
 
  On 14/05/2015 14:00, Christoffer Dall wrote:
  So, getting back to my original question.  Is the point then that UEFI
  must assume (from ACPI/DT) the cache-coherency properties of the PCI
  controller which exists in hardware on the system you're running on,
  even for the virtual PCI bus because that will be the semantics for
  assigned devices?
 
  And in that case, we have no way to distinguish between passthrough
  devices and virtual devices plugged into the virtual PCI bus?
 
  Well, we could use the subsystem id.  But it's a hack, and may cause
  incompatibilities with some drivers.  Michael, any ideas?
 
  What about the idea of having two virtual PCI buses on your system where
  one is always cache-coherent and uses for virtual devices, and the other
  is whatever the hardware is and used for passthrough devices?
 
  I think that was rejected before.
  
  Do you remember where?  I just remember Catalin mentioning the idea to
  me verbally.
 
 In the last centithread on the subject. :)
 
 At least I and Peter disagreed.  It's not about the heavy added use of
 resources, it's more about it being really easy to misconfigure.
 
  But I'm still not sure why UEFI/Linux currently sees our PCI bus as
  being non-coherent when in fact it is and we have no passthrough issues
  currently.  Are all PCI controllers always non-coherent for some reason
  and therefore we model it as such too?
 
 Well, PCI BARs are generally MMIO resources, and hence should not be cached.
 
 As an optimization, OS drivers can mark them as cacheable or
 write-combining or something like that, but in general it's a safe
 default to leave them uncached---one would think.
 
ok, I guess this series makes sense then, assuming it works, and
assuming we don't kill performance by going to RAM all the time when we
don't have to...

Thanks,
-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Christoffer Dall

On Thu, May 14, 2015 at 01:09:34PM +0200, Laszlo Ersek wrote:
 On 05/14/15 12:30, Christoffer Dall wrote:
  On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote:
  Introduce a new memory region flag, KVM_MEM_UNCACHED, which is
  needed by ARM. This flag informs KVM that the given memory region
  is typically mapped by the guest as non-cacheable. KVM for ARM
  then ensures that that memory is indeed mapped non-cacheable by
  the guest, and also remaps that region as non-cacheable for
  userspace, allowing them both to maintain a coherent view.
 
  Changes since v1:
   1) don't pin pages [Paolo]
   2) ensure the guest maps the memory non-cacheable [me]
   3) clean up memslot flag documentation [Christoffer]
  changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here
  http://www.spinics.net/lists/kvm-arm/msg14022.html
 
  The QEMU series for v1 hasn't really changed. Only the linux
  header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to
  116.  Find the series here
  http://www.spinics.net/lists/kvm-arm/msg14026.html
 
  Testing:
  This series still needs lots of testing, but I thought I'd
  kick it to the list early, as there's been recent interest
  in solving this problem, and I'd like to get test results
  and opinions on this approach from others sooner than later.
  I've tested with AAVMF (UEFI for AArch64 mach-virt guests).
  AAVMF has a kludge in it to avoid the coherency problem.
  
  How does the 'kludge' work?
 
 https://github.com/tianocore/edk2/commit/f9a8be42
 
 (It's probably worth looking at the documentation in the first hunk too,
 under the commit message.)
 
Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
not simply that MMIO regions are coherent?

-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Christoffer Dall

On Thu, May 14, 2015 at 01:31:03PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 13:29, Christoffer Dall wrote:
   (It's probably worth looking at the documentation in the first hunk too,
   under the commit message.)
  
  Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
  not simply that MMIO regions are coherent?
 
 Only until device assignment gets into the picture.
 
Will UEFI have to deal with device assignment in any respect?

-Christoffer

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Christoffer Dall

On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote:
 Introduce a new memory region flag, KVM_MEM_UNCACHED, which is
 needed by ARM. This flag informs KVM that the given memory region
 is typically mapped by the guest as non-cacheable. KVM for ARM
 then ensures that that memory is indeed mapped non-cacheable by
 the guest, and also remaps that region as non-cacheable for
 userspace, allowing them both to maintain a coherent view.
 
 Changes since v1:
  1) don't pin pages [Paolo]
  2) ensure the guest maps the memory non-cacheable [me]
  3) clean up memslot flag documentation [Christoffer]
 changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here
 http://www.spinics.net/lists/kvm-arm/msg14022.html
 
 The QEMU series for v1 hasn't really changed. Only the linux
 header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to
 116.  Find the series here
 http://www.spinics.net/lists/kvm-arm/msg14026.html
 
 Testing:
 This series still needs lots of testing, but I thought I'd
 kick it to the list early, as there's been recent interest
 in solving this problem, and I'd like to get test results
 and opinions on this approach from others sooner than later.
 I've tested with AAVMF (UEFI for AArch64 mach-virt guests).
 AAVMF has a kludge in it to avoid the coherency problem.

How does the 'kludge' work?

 I've tested both with and without that kludge active. Both
 worked for me (almost). Sometimes with the non-kludged
 version I was still able to see a bit of corruption in
 grub's output after edk2 loaded it - not much, and not always,
 but something.

Remind me, this is a VGA framebuffer corruption with a PCI-plugged VGA
card?

Thanks,
-Christoffer

 Anyway, it's quite frustrating, as I'm not sure
 what I'm missing...
 
 This series applies to Linus' 110bc76729d4, but I tested with
 a version backported to the current RHELSA kernel.
 
 Thanks for reviews and testing!
 
 drew
 
 
 Andrew Jones (3):
   arm/arm64: pageattr: add set_memory_nc
   KVM: promote KVM_MEMSLOT_INCOHERENT to uapi
   arm/arm64: KVM: implement 'uncached' mem coherency
 
  Documentation/virtual/kvm/api.txt | 20 --
  arch/arm/include/asm/cacheflush.h |  1 +
  arch/arm/include/asm/kvm_mmu.h|  5 -
  arch/arm/include/asm/pgtable-3level.h |  1 +
  arch/arm/include/asm/pgtable.h|  1 +
  arch/arm/include/uapi/asm/kvm.h   |  1 +
  arch/arm/kvm/arm.c|  1 +
  arch/arm/kvm/mmu.c| 39 
 ++-
  arch/arm/mm/pageattr.c|  7 +++
  arch/arm64/include/asm/cacheflush.h   |  1 +
  arch/arm64/include/asm/kvm_mmu.h  |  5 -
  arch/arm64/include/asm/memory.h   |  1 +
  arch/arm64/include/asm/pgtable.h  |  1 +
  arch/arm64/include/uapi/asm/kvm.h |  1 +
  arch/arm64/mm/pageattr.c  |  8 +++
  include/linux/kvm_host.h  |  1 -
  include/uapi/linux/kvm.h  |  2 ++
  virt/kvm/kvm_main.c   |  7 ++-
  18 files changed, 79 insertions(+), 24 deletions(-)
 
 -- 
 2.1.0

Re: [Qemu-devel] [RFC/RFT PATCH v2 2/3] KVM: promote KVM_MEMSLOT_INCOHERENT to uapi

2015-05-14 Thread Christoffer Dall

On Wed, May 13, 2015 at 01:31:53PM +0200, Andrew Jones wrote:
 Commit 1050dcda30529 introduced KVM_MEMSLOT_INCOHERENT to flag memory
 regions that may have coherency issues due to mapping host system RAM
 as non-cacheable. This was introduced as a KVM internal flag, but now
 give KVM userspace access to it so that it may use it for hinting
 likely problematic regions. Also rename to KVM_MEM_UNCACHED.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com

Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-14 Thread Christoffer Dall

On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
 When S1 and S2 memory attributes combine wrt to caching policy,
 non-cacheable types take precedence. If a guest maps a region as
 device memory, which KVM userspace is using to emulate the device
 using normal, cacheable memory, then we lose coherency. With
 KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
 regions are likely to be problematic. With this patch, as pages
 of these types of regions are faulted into the guest, not only do
 we flush the page's dcache, but we also change userspace's
 mapping to NC in order to maintain coherency.
 
 What if the guest doesn't do what we expect? While we can't
 force a guest to use cacheable memory, we can take advantage of
 the non-cacheable precedence, and force it to use non-cacheable.
 So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
 KVM_MEM_UNCACHED regions to force them to NC.
 
 We now have both guest and userspace on the same page (pun intended)

I'd like to revisit the overall approach here.  Is doing non-cached
accesses in both the guest and host really the right thing to do here?

The semantics of the device becomes that it is cache coherent (because
QEMU is), and I think Marc argued that Linux/UEFI should simply be
adapted to handle whatever emulated devices we have as coherent.  I also
remember someone arguing that would be wrong (Peter?).

Finally, does this address all cache coherency issues with emulated
devices?  Some VOS guys had seen things still not working with this
approach, unsure why...  I'd like to avoid us merging this only to merge
a more complete solution in a few weeks which reverts this solution...

More comments/questions below:

 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  arch/arm/include/asm/kvm_mmu.h|  5 -
  arch/arm/include/asm/pgtable-3level.h |  1 +
  arch/arm/include/asm/pgtable.h|  1 +
  arch/arm/kvm/mmu.c| 37 
 +++
  arch/arm64/include/asm/kvm_mmu.h  |  5 -
  arch/arm64/include/asm/memory.h   |  1 +
  arch/arm64/include/asm/pgtable.h  |  1 +
  7 files changed, 36 insertions(+), 15 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
 index 405aa18833073..e8034a80b12e5 100644
 --- a/arch/arm/include/asm/kvm_mmu.h
 +++ b/arch/arm/include/asm/kvm_mmu.h
 @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct 
 kvm_vcpu *vcpu, pfn_t pfn,
   while (size) {
   void *va = kmap_atomic_pfn(pfn);
  
 - if (need_flush)
 + if (need_flush) {
   kvm_flush_dcache_to_poc(va, PAGE_SIZE);
 + if (ipa_uncached)
 + set_memory_nc((unsigned long)va, 1);

nit: consider moving this outside the need_flush

 + }
  
   if (icache_is_pipt())
   __cpuc_coherent_user_range((unsigned long)va,
 diff --git a/arch/arm/include/asm/pgtable-3level.h 
 b/arch/arm/include/asm/pgtable-3level.h
 index a745a2a53853c..39b3f7a40e663 100644
 --- a/arch/arm/include/asm/pgtable-3level.h
 +++ b/arch/arm/include/asm/pgtable-3level.h
 @@ -121,6 +121,7 @@
   * 2nd stage PTE definitions for LPAE.
   */
  #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0)  2) /* strongly 
 ordered */
 +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5)  2) /* 
 normal non-cacheable */
  #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa)  2) /* normal 
 inner write-through */
  #define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf)  2) /* 
 normal inner write-back */
  #define L_PTE_S2_MT_DEV_SHARED   (_AT(pteval_t, 0x1)  2) /* 
 device */
 diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
 index f40354198bad4..ae13ca8b0a23d 100644
 --- a/arch/arm/include/asm/pgtable.h
 +++ b/arch/arm/include/asm/pgtable.h
 @@ -100,6 +100,7 @@ extern pgprot_t   pgprot_s2_device;
  #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP)
  #define PAGE_HYP_DEVICE  _MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
  #define PAGE_S2  _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
 +#define PAGE_S2_NORMAL_NC__pgprot((pgprot_val(PAGE_S2)  
 ~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC)
  #define PAGE_S2_DEVICE   _MOD_PROT(pgprot_s2_device, 
 L_PTE_S2_RDONLY)
  
  #define __PAGE_NONE  __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
 L_PTE_XN | L_PTE_NONE)
 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index bc1665acd73e7..6b3bd8061bd2a 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
 phys_addr_t fault_ipa,
   struct vm_area_struct *vma;
   pfn_t pfn;
   pgprot_t mem_type = PAGE_S2;
 - bool fault_ipa_uncached;
 + bool fault_ipa_uncached = false;
   bool logging_active =

Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-14 Thread Christoffer Dall

On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote:
 Provide a method to change normal, cacheable memory to non-cacheable.
 KVM will make use of this to keep emulated device memory regions
 coherent with the guest.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com

Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

But you obviously need Russell and Will/Catalin to ack/merge this.

-Christoffer

1 2 3 4 >

1 - 100 of 363 matches

Mail list logo