Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop
Hi Steve, On Wed, Mar 13, 2019 at 10:11:30AM +, Steven Price wrote: > > Personally I think what we need is: > > * Either a patch like the one from Heyi Guo (save/restore CNTVCT_EL0) or > alternatively hooking up KVM_KVMCLOCK_CTRL to prevent the watchdog > firing when user space explicitly stops scheduling the guest for a while. If we save/restore CNTVCT_EL0 and the warning goes away, does the guest wall clock timekeeping get all confused and does it figure this out automagically somehow? Does KVM_KVMCLOCK_CTRL solve that problem? > > * KVM itself saving/restoring CNTVCT_EL0 during suspend/resume so the > guest doesn't see time pass during a suspend. This smells like policy to me so I'd much prefer keeping as much functionality in user space as possible. If we already have the APIs we need from KVM, let's use them. > > * Something equivalent to MSR_KVM_WALL_CLOCK_NEW for arm which allows > the guest to query the wall clock time from the host and provides an > offset between CNTVCT_EL0 to wall clock time which the KVM can update > during suspend/resume. This means that during a suspend/resume the guest > can observe that wall clock time has passed, without having to be > bothered about CNTVCT_EL0 jumping forwards. > Isn't the proper Arm architectural solution for this to read the physical counter for wall clock time keeping ? (Yes that will require a trap on physical counter reads after migration on current systems, but migration sucks in terms of timekeeping already.) Thanks, Christoffer
Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop
[Adding Steven Price, who has recently looked at this, in cc] On Tue, Mar 12, 2019 at 10:08:47AM +, Peter Maydell wrote: > On Tue, 12 Mar 2019 at 06:10, Heyi Guo wrote: > > > > When we stop a VM for more than 30 seconds and then resume it, by qemu > > monitor command "stop" and "cont", Linux on VM will complain of "soft > > lockup - CPU#x stuck for xxs!" as below: > > > > [ 2783.809517] watchdog: BUG: soft lockup - CPU#3 stuck for 2395s! > > [ 2783.809559] watchdog: BUG: soft lockup - CPU#2 stuck for 2395s! > > [ 2783.809561] watchdog: BUG: soft lockup - CPU#1 stuck for 2395s! > > [ 2783.809563] Modules linked in... > > > > This is because Guest Linux uses generic timer virtual counter as > > a software watchdog, and CNTVCT_EL0 does not stop when VM is stopped > > by qemu. > > > > This patch is to fix this issue by saving the value of CNTVCT_EL0 when > > stopping and restoring it when resuming. > > Hi -- I know we have issues with the passage of time in Arm VMs > running under KVM when the VM is suspended, but the topic is > a tricky one, and it's not clear to me that this is the correct > way to fix it. I would prefer to see us start with a discussion > on the kvm-arm mailing list about the best approach to the problem. > > I've cc'd that list and a couple of the Arm KVM maintainers > for their opinion. > > QEMU patch left below for context -- the brief summary is that > it uses KVM_GET_ONE_REG/KVM_SET_ONE_REG on the timer CNT register > to save it on VM pause and write that value back on VM resume. > > thanks > -- PMM > > > > > Cc: Peter Maydell > > Signed-off-by: Heyi Guo > > --- > > target/arm/cpu.c | 65 > > > > 1 file changed, 65 insertions(+) > > > > diff --git a/target/arm/cpu.c b/target/arm/cpu.c > > index 96f0ff0..7bbba3d 100644 > > --- a/target/arm/cpu.c > > +++ b/target/arm/cpu.c > > @@ -896,6 +896,60 @@ static void arm_cpu_finalizefn(Object *obj) > > #endif > > } > > > > +static int get_vcpu_timer_tick(CPUState *cs, uint64_t *tick_at_pause) > > +{ > > +int err; > > +struct kvm_one_reg reg; > > + > > +reg.id = KVM_REG_ARM_TIMER_CNT; > > +reg.addr = (uintptr_t) tick_at_pause; > > + > > +err = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ); > > +return err; > > +} > > + > > +static int set_vcpu_timer_tick(CPUState *cs, uint64_t tick_at_pause) > > +{ > > +int err; > > +struct kvm_one_reg reg; > > + > > +reg.id = KVM_REG_ARM_TIMER_CNT; > > +reg.addr = (uintptr_t) _at_pause; > > + > > +err = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ); > > +return err; > > +} > > + > > +static void arch_timer_change_state_handler(void *opaque, int running, > > +RunState state) > > +{ > > +static uint64_t hw_ticks_at_paused; > > +static RunState pre_state = RUN_STATE__MAX; > > +int err; > > +CPUState *cs = (CPUState *)opaque; > > + > > +switch (state) { > > +case RUN_STATE_PAUSED: > > +err = get_vcpu_timer_tick(cs, _ticks_at_paused); > > +if (err) { > > +error_report("Get vcpu timer tick failed: %d", err); > > +} > > +break; > > +case RUN_STATE_RUNNING: > > +if (pre_state == RUN_STATE_PAUSED) { > > +err = set_vcpu_timer_tick(cs, hw_ticks_at_paused); > > +if (err) { > > +error_report("Resume vcpu timer tick failed: %d", err); > > +} > > +} > > +break; > > +default: > > +break; > > +} > > + > > +pre_state = state; > > +} > > + > > static void arm_cpu_realizefn(DeviceState *dev, Error **errp) > > { > > CPUState *cs = CPU(dev); > > @@ -906,6 +960,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error > > **errp) > > Error *local_err = NULL; > > bool no_aa32 = false; > > > > +/* > > + * Only add change state handler for arch timer once, for KVM will > > help to > > + * synchronize virtual timer of all VCPUs. > > + */ > > +static bool arch_timer_change_state_handler_added; > > + > > /* If we needed to query the host kernel for the CPU features > > * then it's possible that might have failed in the initfn, but > > * this is the first point where we can report it. > > @@ -1181,6 +1241,11 @@ static void arm_cpu_realizefn(DeviceState *dev, > > Error **errp) > > > > init_cpreg_list(cpu); > > > > +if (!arch_timer_change_state_handler_added && kvm_enabled()) { > > +qemu_add_vm_change_state_handler(arch_timer_change_state_handler, > > cs); > > +arch_timer_change_state_handler_added = true; > > +} > > + > > #ifndef CONFIG_USER_ONLY > > if (cpu->has_el3 || arm_feature(env, ARM_FEATURE_M_SECURITY)) { > > cs->num_ases = 2; > > -- > > 1.8.3.1
Re: [Qemu-devel] [RFC] [PATCH] kvm: arm: Introduce error code KVM_EINVARIANT
Hi Manish, On Sat, Nov 10, 2018 at 10:18:47PM +, Manish Jaggi wrote: > > CCing a larger audience. > Please review. > > On 10/23/2018 03:51 PM, Jaggi, Manish wrote: > > From: Manish Jaggi > > > > This patch introduces an error code KVM_EINVARIANT which is returned > > by KVM when userland tries to set an invariant register. > > > > The need for this error code is in VM Migration for arm64. > > ARM64 systems use mainly -machine virt -cpu host as parameter to qemu. > > Migration requires both Source and destination machines to have same > > physical cpu. There are cases where the overall architecture of CPU is > > same but the next version of the chip with some bug fixes which have no > > effect on qemu operation. In such cases invariant registers like MIDR > > have a different value. > > Currently Migration fails in such cases. > > > > Rather than sending a EINVAL, a specifc error code will help > > userland program the guest invariant register by querying the migrated > > host machines invariant registers. > > > > Qemu will have a parameter -hostinvariant along with checking of this > > error code. So it can be safely assumed that the feature is opt-in > > > > Corresponding Qemu patchset can be found at : > > https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg05048.html > > > > > > Signed-off-by: Manish Jaggi > > --- > > arch/arm64/kvm/sys_regs.c | 9 - > > include/uapi/linux/kvm_para.h | 1 + > > 2 files changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > > index 22fbbdb..78ffc02 100644 > > --- a/arch/arm64/kvm/sys_regs.c > > +++ b/arch/arm64/kvm/sys_regs.c > > @@ -,7 +,7 @@ static int __set_id_reg(const struct sys_reg_desc > > *rd, void __user *uaddr, > > > > /* This is what we mean by invariant: you can't change it. */ > > if (val != read_id_reg(rd, raz)) > > - return -EINVAL; > > + return -KVM_EINVARIANT; > > > > return 0; > > } > > @@ -2254,9 +2254,8 @@ static int set_invariant_sys_reg(u64 id, void __user > > *uaddr) > > return err; > > > > /* This is what we mean by invariant: you can't change it. */ > > - if (r->val != val) > > - return -EINVAL; > > - > > + if (r->val != val) > > + return -KVM_EINVARIANT; > > return 0; > > } > > > > @@ -2335,7 +2334,7 @@ static int demux_c15_set(u64 id, void __user *uaddr) > > > > /* This is also invariant: you can't change it. */ > > if (newval != get_ccsidr(val)) > > - return -EINVAL; > > + return -KVM_EINVARIANT; > > return 0; > > default: > > return -ENOENT; > > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h > > index 6c0ce49..4358669 100644 > > --- a/include/uapi/linux/kvm_para.h > > +++ b/include/uapi/linux/kvm_para.h > > @@ -17,6 +17,7 @@ > > #define KVM_E2BIG E2BIG > > #define KVM_EPERM EPERM > > #define KVM_EOPNOTSUPP95 > > +#define KVM_EINVARIANT 96 > > > > #define KVM_HC_VAPIC_POLL_IRQ 1 > > #define KVM_HC_MMU_OP 2 > Eh, unless I'm severely missing something, I think you're confusing how these error codes are used. The error codes in include/uapi/linux/kvm_para.h are for hypercalls from guests, not return values to userspace. For the latter, we don't have a habit of inventing new constants for something subsystem specific, but instead we use the existing error codes which can be understood by userspace software. What's more, changing an error code is an ABI change and not something you can just do at will. If we are overloading EINVAL with the KVM_SET_ONE_REG syscall (are we?), then you need to find some way to communicate the additional information you need to userspace without interfering with legacy userspace software's interaction with the kernel. Thanks, Christoffer
Re: [Qemu-devel] [PATCH v2 0/5] target/arm: KVM vs ARMISARegisters
On Fri, Nov 02, 2018 at 04:36:35PM +, Peter Maydell wrote: > On 2 November 2018 at 14:54, Richard Henderson > wrote: > > My previous patch set for replacing feature bits with id registers > > failed to consider that these id registers are beginning to control > > migration, and thus we must fill them in for KVM as well. > > > > Thus, we want to initialize these values within CPU from the host. > > > > Finally, re-send the T32EE conversion patch, fixing the build > > failure on an arm32 host in kvm32.c. > > > > Changes, v1->v2: > > * Remove assert that AArch32 sysreg <= UINT32_MAX. > > * Remove unused local variable. > > * Add commentary for AArch32 sysregs vs missing AArch32 support. > > As noted on IRC, on my admittedly pretty ancient 4.8.0 kernel some > of these ID register reads via KVM_GET_ONE_REG fail ENOENT. > strace says: > > openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 18 > ioctl(18, KVM_CREATE_VM or LOGGER_GET_LOG_BUF_SIZE, 0) = 19 > ioctl(19, KVM_CREATE_VCPU, 0) = 20 > ioctl(19, KVM_ARM_PREFERRED_TARGET, 0xcfeb4e88) = 0 > ioctl(20, KVM_ARM_VCPU_INIT, 0xcfeb4e88) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) = 0 > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > ioctl(20, KVM_ARM_SET_DEVICE_ADDR or KVM_GET_ONE_REG, 0xcfeb4e28) > = -1 ENOENT (No such file or directory) > > > I added a bit of extra tracing, since strace doesn't > print the ID field for the ioctl: > > peter.maydell@mustang-maydell:~/qemu$ > ~/test-images/virtv8-for-nesting/runme-kvm > ./build/for-kvm/aarch64-softmmu/qemu-system-aarch64 -enable-kvm -cpu > max -machine gic-version=max > read_sys_reg64: reading ID 0x60300013c030...-1 > read_sys_reg64: reading ID 0x60300013c031...-1 > read_sys_reg64: reading ID 0x60300013c020...-1 > read_sys_reg64: reading ID 0x60300013c021...-1 > read_sys_reg32: reading ID 0x60300013c010...0 > read_sys_reg32: reading ID 0x60300013c011...0 > read_sys_reg32: reading ID 0x60300013c012...0 > read_sys_reg32: reading ID 0x60300013c013...0 > read_sys_reg32: reading ID 0x60300013c014...0 > read_sys_reg32: reading ID 0x60300013c015...0 > read_sys_reg32: reading ID 0x60300013c017...-1 > read_sys_reg32: reading ID 0x60300013c018...-1 > read_sys_reg32: reading ID 0x60300013c019...-1 > read_sys_reg32: reading ID 0x60300013c01a...-1 > qemu-system-aarch64: Failed to retrieve host CPU features > > It looks like the kernel can handle reads of ID_ISAR0_EL1 > through ID_ISAR5_EL1, but not ID_ISAR6_EL1, any of the > MVFR*_EL1 or ID_AA64_ISAR* or ID_AA64PFR*. > > This is probably because the kernel is way too old to be > interestingly supportable for KVM, but we did previously > manage to boot on this setup. I'm a little confused. v4.8 used to work (although it was perhaps not the most stable at that time). What changed? Is this attempting to restore a VM from a newer kernel, or has QEMU been updated to detect this? > > We should probably at least figure out which version of > the kernel fixed this bug and made the ID registers available > to userspace... if it's sufficiently ancient we could > likely say "not supported", but if it's more recent we > need a workaround somehow. I have cc'd a couple of kernel > folks who might be able to help with the "which version" > question. > It appears the support for exposing a bunch of ID registers was introduced with: 93390c0a1b20 (arm64: KVM: Hide unsupported AArch64 CPU features from guests, 2017-10-31) Which Dave (cc'ed) wrote and which was introduced in v4.15. As per my question above, I'm not exactly sure what (if anything) we need to fix on the kernel side? Thanks, Christoffer
Re: [Qemu-devel] [PATCH] target/arm/kvm: gic: Prevent creating userspace GICv3 with KVM
Hi Philippe, On Mon, Feb 05, 2018 at 09:10:48PM -0300, Philippe Mathieu-Daudé wrote: > > On 02/01/2018 05:53 PM, Christoffer Dall wrote: > > KVM doesn't support emulating a GICv3 in userspace, only GICv2. We > > currently attempt this anyway, and as a result a KVM guest doesn't > > receive interrupts and the user is left wondering why. Report an error > > to the user if this particular combination is requested. > > > > Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> > > --- > > target/arm/kvm_arm.h | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h > > index ff53e9fafb..cfb7e5af72 100644 > > --- a/target/arm/kvm_arm.h > > +++ b/target/arm/kvm_arm.h > > @@ -234,6 +234,10 @@ static inline const char *gicv3_class_name(void) > > Any reason why these functions are inlined in this include? Not really sure what you mean? What do you think would be a better approach? > > Also - not related to your patch - while gicv3_class_name() is > self-explicit, gic_class_name() isn't (to me at least). I think I tend to see 'gic' as GICv2 and 'gicv3' as GICv3 in QEMU, but I don't remember the details of how you make the GIC be a proper GICv2 vs. the 11mpcore version thingy. > > There are many check for gic_version 1/2/3 in virt.c, which we could > clean using more generic functions such gic_class_name(int gic_version) > and Co. Possibly. Do note, that checking "which version of the GIC does this board contain" is a separate thing from "which particular mechanism do I use to emulate the GIC". > > > exit(1); > > #endif > > } else { > > +if (kvm_enabled()) { > > +error_report("Userspace GICv3 is not supported with KVM"); > > +exit(1); > > Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org> > > > +} > > return "arm-gicv3"; > > } > > } > > > Thanks! -Christoffer signature.asc Description: PGP signature
[Qemu-devel] [PATCH] target/arm/kvm: gic: Prevent creating userspace GICv3 with KVM
KVM doesn't support emulating a GICv3 in userspace, only GICv2. We currently attempt this anyway, and as a result a KVM guest doesn't receive interrupts and the user is left wondering why. Report an error to the user if this particular combination is requested. Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> --- target/arm/kvm_arm.h | 4 1 file changed, 4 insertions(+) diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h index ff53e9fafb..cfb7e5af72 100644 --- a/target/arm/kvm_arm.h +++ b/target/arm/kvm_arm.h @@ -234,6 +234,10 @@ static inline const char *gicv3_class_name(void) exit(1); #endif } else { +if (kvm_enabled()) { +error_report("Userspace GICv3 is not supported with KVM"); +exit(1); +} return "arm-gicv3"; } } -- 2.14.2
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Thu, Feb 01, 2018 at 01:25:20PM +0100, Andrew Jones wrote: > On Thu, Feb 01, 2018 at 11:48:31AM +0100, Christoffer Dall wrote: > > On Thu, Feb 01, 2018 at 11:42:22AM +0100, Andrew Jones wrote: > > > On Thu, Feb 01, 2018 at 10:59:54AM +0100, Christoffer Dall wrote: > > > > On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel > > > > > If you could use dirty page tracking to only perform the cache > > > > > invalidation when the framebuffer memory has been updated, you can at > > > > > least limit the impact to cases where the framebuffer is actually > > > > > used, rather than sitting idle with a nice wallpaper image. > > > > > > Yes, this is the exact approach I took back when I experimented with > > > this. I must have screwed up my PoC in some way (like using the gcc > > > builtin), because it wasn't working for me... > > > > > Does that mean you have some code you feel like reviving and use to send > > out an RFC based on? ;) > > > > I don't usually delete things, but I do do some sort of copy on use > from old homedirs to new ones from time to time, letting stuff that > gets really old completely drop at some point. I might be able to > find the old work, or just hack a quick and ugly version from my > memory to share - hopefully whatever I did to cause it to fail last > time has been forgotten :-) > If you have bandwidth to look at this that's great. Otherwise let us know, and I'll see if I can devote some time to it. Thanks, -Christoffer
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Thu, Feb 01, 2018 at 11:42:22AM +0100, Andrew Jones wrote: > On Thu, Feb 01, 2018 at 10:59:54AM +0100, Christoffer Dall wrote: > > On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel > > <ard.biesheu...@linaro.org> wrote: > > > On 1 February 2018 at 09:17, Christoffer Dall > > > <christoffer.d...@linaro.org> wrote: > > >> On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel > > >> <ard.biesheu...@linaro.org> wrote: > > >>> On 31 January 2018 at 19:12, Christoffer Dall > > >>> <christoffer.d...@linaro.org> wrote: > > >>>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel > > >>>> <ard.biesheu...@linaro.org> wrote: > > >>>>> On 31 January 2018 at 17:39, Christoffer Dall > > >>>>> <christoffer.d...@linaro.org> wrote: > > >>>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel > > >>>>>> <ard.biesheu...@linaro.org> wrote: > > >>>>>>> On 31 January 2018 at 16:53, Christoffer Dall > > >>>>>>> <christoffer.d...@linaro.org> wrote: > > >>>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel > > >>>>>>>> <ard.biesheu...@linaro.org> wrote: > > >>>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall > > >>>>>>>>> <christoffer.d...@linaro.org> wrote: > > >>>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: > > >>>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote: > > >>>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert > > >>>>>>>>>>> > <dgilb...@redhat.com> wrote: > > >>>>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: > > >>>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert > > >>>>>>>>>>> >>> <dgilb...@redhat.com> wrote: > > >>>>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: > > >>>>>>>>>>> >>>>> I think the correct fix here is that your test code > > >>>>>>>>>>> >>>>> should turn > > >>>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable > > >>>>>>>>>>> >>>>> doesn't work > > >>>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device > > >>>>>>>>>>> >>>>> video memory > > >>>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to > > >>>>>>>>>>> >>>>> map it as > > >>>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine. > > >>>>>>>>>>> >>>> > > >>>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong > > >>>>>>>>>>> >>>> point during > > >>>>>>>>>>> >>>> a VM boot? > > >>>>>>>>>>> >>> > > >>>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've > > >>>>>>>>>>> >>> ever > > >>>>>>>>>>> >>> tried to provoke that problem... > > >>>>>>>>>>> >> > > >>>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be > > >>>>>>>>>>> >> best to fail > > >>>>>>>>>>> >> the migration if you can detect the cache is disabled in the > > >>>>>>>>>>> >> guest. > > >>>>>>>>>>> > > > >>>>>>>>>>> > I guess QEMU could look at the value of the "MMU > > >>>>>>>>>>> > disabled/enabled" bit > > >>>>>>>>&
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 1 February 2018 at 09:17, Christoffer Dall > <christoffer.d...@linaro.org> wrote: >> On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 31 January 2018 at 19:12, Christoffer Dall >>> <christoffer.d...@linaro.org> wrote: >>>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel >>>> <ard.biesheu...@linaro.org> wrote: >>>>> On 31 January 2018 at 17:39, Christoffer Dall >>>>> <christoffer.d...@linaro.org> wrote: >>>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel >>>>>> <ard.biesheu...@linaro.org> wrote: >>>>>>> On 31 January 2018 at 16:53, Christoffer Dall >>>>>>> <christoffer.d...@linaro.org> wrote: >>>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel >>>>>>>> <ard.biesheu...@linaro.org> wrote: >>>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall >>>>>>>>> <christoffer.d...@linaro.org> wrote: >>>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: >>>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote: >>>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert >>>>>>>>>>> > <dgilb...@redhat.com> wrote: >>>>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert >>>>>>>>>>> >>> <dgilb...@redhat.com> wrote: >>>>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>>>>>>>> >>>>> I think the correct fix here is that your test code should >>>>>>>>>>> >>>>> turn >>>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't >>>>>>>>>>> >>>>> work >>>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video >>>>>>>>>>> >>>>> memory >>>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map >>>>>>>>>>> >>>>> it as >>>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong >>>>>>>>>>> >>>> point during >>>>>>>>>>> >>>> a VM boot? >>>>>>>>>>> >>> >>>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever >>>>>>>>>>> >>> tried to provoke that problem... >>>>>>>>>>> >> >>>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best >>>>>>>>>>> >> to fail >>>>>>>>>>> >> the migration if you can detect the cache is disabled in the >>>>>>>>>>> >> guest. >>>>>>>>>>> > >>>>>>>>>>> > I guess QEMU could look at the value of the "MMU >>>>>>>>>>> > disabled/enabled" bit >>>>>>>>>>> > in the guest's system registers, and refuse migration if it's >>>>>>>>>>> > off... >>>>>>>>>>> > >>>>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end >>>>>>>>>>> > of the stick about how thin the ice is in the period before the >>>>>>>>>>> > guest turns on its MMU...) >>>>>>>>>>> >>>>>>>>>>> Once MMU and caches are on, we should be in a reasonable place for >>>>>>>>>>> QEMU >>>>>>>>>
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 31 January 2018 at 19:12, Christoffer Dall > <christoffer.d...@linaro.org> wrote: >> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 31 January 2018 at 17:39, Christoffer Dall >>> <christoffer.d...@linaro.org> wrote: >>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel >>>> <ard.biesheu...@linaro.org> wrote: >>>>> On 31 January 2018 at 16:53, Christoffer Dall >>>>> <christoffer.d...@linaro.org> wrote: >>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel >>>>>> <ard.biesheu...@linaro.org> wrote: >>>>>>> On 31 January 2018 at 09:53, Christoffer Dall >>>>>>> <christoffer.d...@linaro.org> wrote: >>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: >>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote: >>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert >>>>>>>>> > <dgilb...@redhat.com> wrote: >>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert >>>>>>>>> >>> <dgilb...@redhat.com> wrote: >>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>>>>>> >>>>> I think the correct fix here is that your test code should turn >>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't >>>>>>>>> >>>>> work >>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video >>>>>>>>> >>>>> memory >>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it >>>>>>>>> >>>>> as >>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine. >>>>>>>>> >>>> >>>>>>>>> >>>> Does this cause problems with migrating at just the wrong point >>>>>>>>> >>>> during >>>>>>>>> >>>> a VM boot? >>>>>>>>> >>> >>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever >>>>>>>>> >>> tried to provoke that problem... >>>>>>>>> >> >>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to >>>>>>>>> >> fail >>>>>>>>> >> the migration if you can detect the cache is disabled in the guest. >>>>>>>>> > >>>>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" >>>>>>>>> > bit >>>>>>>>> > in the guest's system registers, and refuse migration if it's off... >>>>>>>>> > >>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end >>>>>>>>> > of the stick about how thin the ice is in the period before the >>>>>>>>> > guest turns on its MMU...) >>>>>>>>> >>>>>>>>> Once MMU and caches are on, we should be in a reasonable place for >>>>>>>>> QEMU >>>>>>>>> to have a consistent view of the memory. The trick is to prevent the >>>>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at >>>>>>>>> any given time if it needs to (and it is actually required on some HW >>>>>>>>> if >>>>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that. >>>>>>>>> >>>>>>>> >>>>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's >>>>>>>> system registers, but we don't trap to KVM when the VCPU turns off its >>>>>>>> caches, right?) >>>>>>>> &g
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 31 January 2018 at 17:39, Christoffer Dall > <christoffer.d...@linaro.org> wrote: >> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 31 January 2018 at 16:53, Christoffer Dall >>> <christoffer.d...@linaro.org> wrote: >>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel >>>> <ard.biesheu...@linaro.org> wrote: >>>>> On 31 January 2018 at 09:53, Christoffer Dall >>>>> <christoffer.d...@linaro.org> wrote: >>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: >>>>>>> On 29/01/18 10:04, Peter Maydell wrote: >>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert >>>>>>> > <dgilb...@redhat.com> wrote: >>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert >>>>>>> >>> <dgilb...@redhat.com> wrote: >>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>>>> >>>>> I think the correct fix here is that your test code should turn >>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work >>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video >>>>>>> >>>>> memory >>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as >>>>>>> >>>>> Normal Cacheable, and then everything should work fine. >>>>>>> >>>> >>>>>>> >>>> Does this cause problems with migrating at just the wrong point >>>>>>> >>>> during >>>>>>> >>>> a VM boot? >>>>>>> >>> >>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever >>>>>>> >>> tried to provoke that problem... >>>>>>> >> >>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to >>>>>>> >> fail >>>>>>> >> the migration if you can detect the cache is disabled in the guest. >>>>>>> > >>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit >>>>>>> > in the guest's system registers, and refuse migration if it's off... >>>>>>> > >>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end >>>>>>> > of the stick about how thin the ice is in the period before the >>>>>>> > guest turns on its MMU...) >>>>>>> >>>>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU >>>>>>> to have a consistent view of the memory. The trick is to prevent the >>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at >>>>>>> any given time if it needs to (and it is actually required on some HW if >>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that. >>>>>>> >>>>>> >>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's >>>>>> system registers, but we don't trap to KVM when the VCPU turns off its >>>>>> caches, right?) >>>>>> >>>>>>> You may have to pause the vcpus before starting the migration, or >>>>>>> introduce a new KVM feature that would automatically pause a vcpu that >>>>>>> is trying to disable its MMU while the migration is on. This would >>>>>>> involve trapping all the virtual memory related system registers, with >>>>>>> an obvious cost. But that cost would be limited to the time it takes to >>>>>>> migrate the memory, so maybe that's acceptable. >>>>>>> >>>>>> Is that even sufficient? >>>>>> >>>>>> What if the following happened. (1) guest turns off MMU, (2) guest >>>>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads >>>>&g
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 31 January 2018 at 16:53, Christoffer Dall > <christoffer.d...@linaro.org> wrote: >> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel >> <ard.biesheu...@linaro.org> wrote: >>> On 31 January 2018 at 09:53, Christoffer Dall >>> <christoffer.d...@linaro.org> wrote: >>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: >>>>> On 29/01/18 10:04, Peter Maydell wrote: >>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert >>>>> > <dgilb...@redhat.com> wrote: >>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert >>>>> >>> <dgilb...@redhat.com> wrote: >>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>>>> >>>>> I think the correct fix here is that your test code should turn >>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work >>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video memory >>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as >>>>> >>>>> Normal Cacheable, and then everything should work fine. >>>>> >>>> >>>>> >>>> Does this cause problems with migrating at just the wrong point >>>>> >>>> during >>>>> >>>> a VM boot? >>>>> >>> >>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever >>>>> >>> tried to provoke that problem... >>>>> >> >>>>> >> If you think it'll get the RAM contents wrong, it might be best to fail >>>>> >> the migration if you can detect the cache is disabled in the guest. >>>>> > >>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit >>>>> > in the guest's system registers, and refuse migration if it's off... >>>>> > >>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end >>>>> > of the stick about how thin the ice is in the period before the >>>>> > guest turns on its MMU...) >>>>> >>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU >>>>> to have a consistent view of the memory. The trick is to prevent the >>>>> vcpus from changing that. A guest could perfectly turn off its MMU at >>>>> any given time if it needs to (and it is actually required on some HW if >>>>> you want to mitigate headlining CVEs), and KVM won't know about that. >>>>> >>>> >>>> (Clarification: KVM can detect this is it bother to check the VCPU's >>>> system registers, but we don't trap to KVM when the VCPU turns off its >>>> caches, right?) >>>> >>>>> You may have to pause the vcpus before starting the migration, or >>>>> introduce a new KVM feature that would automatically pause a vcpu that >>>>> is trying to disable its MMU while the migration is on. This would >>>>> involve trapping all the virtual memory related system registers, with >>>>> an obvious cost. But that cost would be limited to the time it takes to >>>>> migrate the memory, so maybe that's acceptable. >>>>> >>>> Is that even sufficient? >>>> >>>> What if the following happened. (1) guest turns off MMU, (2) guest >>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads >>>> guest ram. QEMU's view of guest ram is now incorrect (stale, >>>> incoherent, ...). >>>> >>>> I'm also not really sure if pausing one VCPU because it turned off its >>>> MMU will go very well when trying to migrate a large VM (wouldn't this >>>> ask for all the other VCPUs beginning to complain that the stopped VCPU >>>> appears to be dead?). As a short-term 'fix' it's probably better to >>>> refuse migration if you detect that a VCPU had begun turning off its >>>> MMU. >>>> >>>> On the larger scale of thins; this appears to me to be another case of >>>> us really needing some way to coherently access memo
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 31 January 2018 at 09:53, Christoffer Dall > <christoffer.d...@linaro.org> wrote: >> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: >>> On 29/01/18 10:04, Peter Maydell wrote: >>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert <dgilb...@redhat.com> >>> > wrote: >>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert >>> >>> <dgilb...@redhat.com> wrote: >>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: >>> >>>>> I think the correct fix here is that your test code should turn >>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work >>> >>>>> for Arm KVM guests (for the same reason that VGA device video memory >>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as >>> >>>>> Normal Cacheable, and then everything should work fine. >>> >>>> >>> >>>> Does this cause problems with migrating at just the wrong point during >>> >>>> a VM boot? >>> >>> >>> >>> It wouldn't surprise me if it did, but I don't think I've ever >>> >>> tried to provoke that problem... >>> >> >>> >> If you think it'll get the RAM contents wrong, it might be best to fail >>> >> the migration if you can detect the cache is disabled in the guest. >>> > >>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit >>> > in the guest's system registers, and refuse migration if it's off... >>> > >>> > (cc'd Marc, Christoffer to check that I don't have the wrong end >>> > of the stick about how thin the ice is in the period before the >>> > guest turns on its MMU...) >>> >>> Once MMU and caches are on, we should be in a reasonable place for QEMU >>> to have a consistent view of the memory. The trick is to prevent the >>> vcpus from changing that. A guest could perfectly turn off its MMU at >>> any given time if it needs to (and it is actually required on some HW if >>> you want to mitigate headlining CVEs), and KVM won't know about that. >>> >> >> (Clarification: KVM can detect this is it bother to check the VCPU's >> system registers, but we don't trap to KVM when the VCPU turns off its >> caches, right?) >> >>> You may have to pause the vcpus before starting the migration, or >>> introduce a new KVM feature that would automatically pause a vcpu that >>> is trying to disable its MMU while the migration is on. This would >>> involve trapping all the virtual memory related system registers, with >>> an obvious cost. But that cost would be limited to the time it takes to >>> migrate the memory, so maybe that's acceptable. >>> >> Is that even sufficient? >> >> What if the following happened. (1) guest turns off MMU, (2) guest >> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads >> guest ram. QEMU's view of guest ram is now incorrect (stale, >> incoherent, ...). >> >> I'm also not really sure if pausing one VCPU because it turned off its >> MMU will go very well when trying to migrate a large VM (wouldn't this >> ask for all the other VCPUs beginning to complain that the stopped VCPU >> appears to be dead?). As a short-term 'fix' it's probably better to >> refuse migration if you detect that a VCPU had begun turning off its >> MMU. >> >> On the larger scale of thins; this appears to me to be another case of >> us really needing some way to coherently access memory between QEMU and >> the VM, but in the case of the VCPU turning off the MMU prior to >> migration, we don't even know where it may have written data, and I'm >> therefore not really sure what the 'proper' solution would be. >> >> (cc'ing Ard who has has thought about this problem before in the context >> of UEFI and VGA.) >> > > Actually, the VGA case is much simpler because the host is not > expected to write to the framebuffer, only read from it, and the guest > is not expected to create a cacheable mapping for it, so any > incoherency can be trivially solved by cache invalidation on the host > side. (Note that this has nothing to do with DMA coherency, but only > with PCI MMIO BARs that are backed by DRAM in t
Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote: > On 29/01/18 10:04, Peter Maydell wrote: > > On 29 January 2018 at 09:53, Dr. David Alan Gilbert> > wrote: > >> * Peter Maydell (peter.mayd...@linaro.org) wrote: > >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert > >>> wrote: > * Peter Maydell (peter.mayd...@linaro.org) wrote: > > I think the correct fix here is that your test code should turn > > its MMU on. Trying to treat guest RAM as uncacheable doesn't work > > for Arm KVM guests (for the same reason that VGA device video memory > > doesn't work). If it's RAM your guest has to arrange to map it as > > Normal Cacheable, and then everything should work fine. > > Does this cause problems with migrating at just the wrong point during > a VM boot? > >>> > >>> It wouldn't surprise me if it did, but I don't think I've ever > >>> tried to provoke that problem... > >> > >> If you think it'll get the RAM contents wrong, it might be best to fail > >> the migration if you can detect the cache is disabled in the guest. > > > > I guess QEMU could look at the value of the "MMU disabled/enabled" bit > > in the guest's system registers, and refuse migration if it's off... > > > > (cc'd Marc, Christoffer to check that I don't have the wrong end > > of the stick about how thin the ice is in the period before the > > guest turns on its MMU...) > > Once MMU and caches are on, we should be in a reasonable place for QEMU > to have a consistent view of the memory. The trick is to prevent the > vcpus from changing that. A guest could perfectly turn off its MMU at > any given time if it needs to (and it is actually required on some HW if > you want to mitigate headlining CVEs), and KVM won't know about that. > (Clarification: KVM can detect this is it bother to check the VCPU's system registers, but we don't trap to KVM when the VCPU turns off its caches, right?) > You may have to pause the vcpus before starting the migration, or > introduce a new KVM feature that would automatically pause a vcpu that > is trying to disable its MMU while the migration is on. This would > involve trapping all the virtual memory related system registers, with > an obvious cost. But that cost would be limited to the time it takes to > migrate the memory, so maybe that's acceptable. > Is that even sufficient? What if the following happened. (1) guest turns off MMU, (2) guest writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads guest ram. QEMU's view of guest ram is now incorrect (stale, incoherent, ...). I'm also not really sure if pausing one VCPU because it turned off its MMU will go very well when trying to migrate a large VM (wouldn't this ask for all the other VCPUs beginning to complain that the stopped VCPU appears to be dead?). As a short-term 'fix' it's probably better to refuse migration if you detect that a VCPU had begun turning off its MMU. On the larger scale of thins; this appears to me to be another case of us really needing some way to coherently access memory between QEMU and the VM, but in the case of the VCPU turning off the MMU prior to migration, we don't even know where it may have written data, and I'm therefore not really sure what the 'proper' solution would be. (cc'ing Ard who has has thought about this problem before in the context of UEFI and VGA.) Thanks, -Christoffer
Re: [Qemu-devel] [PATCH v2 1/4] hw/arm/virt: add pmu interrupt state
On Fri, Jul 21, 2017 at 01:35:46PM +0200, Andrew Jones wrote: > On Fri, Jul 21, 2017 at 01:16:07PM +0200, Christoffer Dall wrote: > > On Wed, Jul 19, 2017 at 09:39:54AM -0400, Andrew Jones wrote: > > > Mimicking gicv3-maintenance-interrupt, add the PMU's interrupt to > > > CPU state. > > > > > > Signed-off-by: Andrew Jones <drjo...@redhat.com> > > > Reviewed-by: Peter Maydell <peter.mayd...@linaro.org> > > > --- > > > hw/arm/virt.c| 3 +++ > > > target/arm/cpu.c | 2 ++ > > > target/arm/cpu.h | 2 ++ > > > 3 files changed, 7 insertions(+) > > > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > > index 31739d75a3e0..ea26f0c473c2 100644 > > > --- a/hw/arm/virt.c > > > +++ b/hw/arm/virt.c > > > @@ -610,6 +610,9 @@ static void create_gic(VirtMachineState *vms, > > > qemu_irq *pic) > > > qdev_connect_gpio_out_named(cpudev, > > > "gicv3-maintenance-interrupt", 0, > > > qdev_get_gpio_in(gicdev, ppibase > > > + > > > ARCH_GICV3_MAINT_IRQ)); > > > +qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0, > > > +qdev_get_gpio_in(gicdev, ppibase > > > + + VIRTUAL_PMU_IRQ)); > > > > I know Peter reviewed this, but isn't it a bit strange to create the > > pmu-interrupt when creating the gic (as this isn't an output from the > > GIC like the maintenance interrupt is) ? > > > > Above the gic maintenance interrupt connection the timer irqs are also > connected. So, while the function name implies we only create the gic, > its function appears to be both its creation and the wiring up of CPU > inputs and outputs. > ok, I didn't see that. Otherwise this patch looks good to me. -Christoffer
Re: [Qemu-devel] [PATCH v2 4/4] target/arm/kvm: pmu: improve error handling
On Wed, Jul 19, 2017 at 09:39:57AM -0400, Andrew Jones wrote: > If a KVM PMU init or set-irq attr call fails we just silently stop > the PMU DT node generation. The only way they could fail, though, > is if the attr's respective KVM has-attr call fails. But that should > never happen if KVM advertises the PMU capability, because both > attrs have been available since the capability was introduced. Let's > just abort if this should-never-happen stuff does happen, because, > if it does, then something is obviously horribly wrong. > > Signed-off-by: Andrew Jones <drjo...@redhat.com> Reviewed-by: Christoffer Dall <cd...@linaro.org> > --- > hw/arm/virt.c| 9 +++-- > target/arm/kvm32.c | 3 +-- > target/arm/kvm64.c | 28 > target/arm/kvm_arm.h | 15 --- > 4 files changed, 28 insertions(+), 27 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index a215330444da..4bc50964d52b 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -496,13 +496,10 @@ static void fdt_add_pmu_nodes(const VirtMachineState > *vms) > return; > } > if (kvm_enabled()) { > -if (kvm_irqchip_in_kernel() && > -!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) { > -return; > -} > -if (!kvm_arm_pmu_init(cpu)) { > -return; > +if (kvm_irqchip_in_kernel()) { > +kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ)); > } > +kvm_arm_pmu_init(cpu); > } > } > > diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c > index e3aab89a1a94..717a2562670b 100644 > --- a/target/arm/kvm32.c > +++ b/target/arm/kvm32.c > @@ -522,10 +522,9 @@ bool kvm_arm_hw_debug_active(CPUState *cs) > return false; > } > > -int kvm_arm_pmu_set_irq(CPUState *cs, int irq) > +void kvm_arm_pmu_set_irq(CPUState *cs, int irq) > { > qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__); > -return 0; > } > > int kvm_arm_pmu_init(CPUState *cs) > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c > index ec7d85314acc..6554c30007a4 100644 > --- a/target/arm/kvm64.c > +++ b/target/arm/kvm64.c > @@ -387,30 +387,36 @@ static bool kvm_arm_pmu_set_attr(CPUState *cs, struct > kvm_device_attr *attr) > > err = kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr); > if (err != 0) { > +error_report("PMU: KVM_HAS_DEVICE_ATTR: %s", strerror(-err)); > return false; > } > > err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, attr); > -if (err < 0) { > -fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n", > -strerror(-err)); > -abort(); > +if (err != 0) { > +error_report("PMU: KVM_SET_DEVICE_ATTR: %s", strerror(-err)); > +return false; > } > > return true; > } > > -int kvm_arm_pmu_init(CPUState *cs) > +void kvm_arm_pmu_init(CPUState *cs) > { > struct kvm_device_attr attr = { > .group = KVM_ARM_VCPU_PMU_V3_CTRL, > .attr = KVM_ARM_VCPU_PMU_V3_INIT, > }; > > -return kvm_arm_pmu_set_attr(cs, ); > +if (!ARM_CPU(cs)->has_pmu) { > +return; > +} > +if (!kvm_arm_pmu_set_attr(cs, )) { > +error_report("failed to init PMU"); > +abort(); > +} > } > > -int kvm_arm_pmu_set_irq(CPUState *cs, int irq) > +void kvm_arm_pmu_set_irq(CPUState *cs, int irq) > { > struct kvm_device_attr attr = { > .group = KVM_ARM_VCPU_PMU_V3_CTRL, > @@ -418,7 +424,13 @@ int kvm_arm_pmu_set_irq(CPUState *cs, int irq) > .attr = KVM_ARM_VCPU_PMU_V3_IRQ, > }; > > -return kvm_arm_pmu_set_attr(cs, ); > +if (!ARM_CPU(cs)->has_pmu) { > +return; > +} > +if (!kvm_arm_pmu_set_attr(cs, )) { > +error_report("failed to set irq for PMU"); > +abort(); > +} > } > > static inline void set_feature(uint64_t *features, int feature) > diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h > index cab5ea9be55c..ff53e9fafb7a 100644 > --- a/target/arm/kvm_arm.h > +++ b/target/arm/kvm_arm.h > @@ -195,8 +195,8 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu); > > int kvm_arm_vgic_probe(void); > > -int kvm_arm_pmu_set_irq(CPUState *cs, int irq); > -int kvm_arm_pmu_init(CPUState *cs); > +void kvm_arm_pmu_set_irq(CPUState *cs, int irq); > +void kvm_arm_pmu_init(CPUState *cs); > > #else > > @@ -205,15 +205,8 @@ static inline int kvm_arm_vgic_probe(void) > return 0; > } > > -static inline int kvm_arm_pmu_set_irq(CPUState *cs, int irq) > -{ > -return 0; > -} > - > -static inline int kvm_arm_pmu_init(CPUState *cs) > -{ > -return 0; > -} > +static inline void kvm_arm_pmu_set_irq(CPUState *cs, int irq) {} > +static inline void kvm_arm_pmu_init(CPUState *cs) {} > > #endif > > -- > 1.8.3.1 >
Re: [Qemu-devel] [PATCH v2 3/4] hw/arm/virt: allow pmu instantiation with userspace irqchip
On Wed, Jul 19, 2017 at 09:39:56AM -0400, Andrew Jones wrote: > Move the in-kernel-irqchip test to only guard the set-irq > stage, not the init stage of the PMU. Also add the PMU to > the KVM device irq line synchronization to enable its use. > > Signed-off-by: Andrew Jones <drjo...@redhat.com> Reviewed-by: Christoffer Dall <cd...@linaro.org> > --- > hw/arm/virt.c | 3 ++- > target/arm/kvm.c | 6 +- > target/arm/kvm64.c | 3 +-- > 3 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 7157a028adce..a215330444da 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -496,7 +496,8 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms) > return; > } > if (kvm_enabled()) { > -if (!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) { > +if (kvm_irqchip_in_kernel() && > +!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) { > return; > } > if (!kvm_arm_pmu_init(cpu)) { > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > index 7c17f0d629d7..211a7bf7befd 100644 > --- a/target/arm/kvm.c > +++ b/target/arm/kvm.c > @@ -567,7 +567,11 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct > kvm_run *run) > switched_level &= ~KVM_ARM_DEV_EL1_PTIMER; > } > > -/* XXX PMU IRQ is missing */ > +if (switched_level & KVM_ARM_DEV_PMU) { > +qemu_set_irq(cpu->pmu_interrupt, > + !!(run->s.regs.device_irq_level & KVM_ARM_DEV_PMU)); > +switched_level &= ~KVM_ARM_DEV_PMU; > +} > > if (switched_level) { > qemu_log_mask(LOG_UNIMP, "%s: unhandled in-kernel device IRQ > %x\n", > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c > index e26638a6fac1..ec7d85314acc 100644 > --- a/target/arm/kvm64.c > +++ b/target/arm/kvm64.c > @@ -506,8 +506,7 @@ int kvm_arch_init_vcpu(CPUState *cs) > if (!arm_feature(>env, ARM_FEATURE_AARCH64)) { > cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT; > } > -if (!kvm_irqchip_in_kernel() || > -!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) { > +if (!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) { > cpu->has_pmu = false; > } > if (cpu->has_pmu) { > -- > 1.8.3.1 >
Re: [Qemu-devel] [PATCH v2 2/4] target/arm/kvm: pmu: split init and set-irq stages
On Wed, Jul 19, 2017 at 09:39:55AM -0400, Andrew Jones wrote: > When adding a PMU with a userspace irqchip we skip the set-irq > stage of device creation. Split the 'create' function into two > functions 'init' and 'set-irq' so they may be called separately. > > Signed-off-by: Andrew Jones <drjo...@redhat.com> Reviewed-by: Christoffer Dall <cd...@linaro.org> > --- > hw/arm/virt.c| 11 +-- > target/arm/kvm32.c | 8 +++- > target/arm/kvm64.c | 52 > +--- > target/arm/kvm_arm.h | 10 -- > 4 files changed, 49 insertions(+), 32 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index ea26f0c473c2..7157a028adce 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -492,10 +492,17 @@ static void fdt_add_pmu_nodes(const VirtMachineState > *vms) > > CPU_FOREACH(cpu) { > armcpu = ARM_CPU(cpu); > -if (!arm_feature(>env, ARM_FEATURE_PMU) || > -(kvm_enabled() && !kvm_arm_pmu_create(cpu, > PPI(VIRTUAL_PMU_IRQ { > +if (!arm_feature(>env, ARM_FEATURE_PMU)) { > return; > } > +if (kvm_enabled()) { > +if (!kvm_arm_pmu_set_irq(cpu, PPI(VIRTUAL_PMU_IRQ))) { > +return; > +} > +if (!kvm_arm_pmu_init(cpu)) { > +return; > +} > +} > } > > if (vms->gic_version == 2) { > diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c > index 069da0c5fd10..e3aab89a1a94 100644 > --- a/target/arm/kvm32.c > +++ b/target/arm/kvm32.c > @@ -522,7 +522,13 @@ bool kvm_arm_hw_debug_active(CPUState *cs) > return false; > } > > -int kvm_arm_pmu_create(CPUState *cs, int irq) > +int kvm_arm_pmu_set_irq(CPUState *cs, int irq) > +{ > +qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__); > +return 0; > +} > + > +int kvm_arm_pmu_init(CPUState *cs) > { > qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__); > return 0; > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c > index a16abc8d129e..e26638a6fac1 100644 > --- a/target/arm/kvm64.c > +++ b/target/arm/kvm64.c > @@ -381,46 +381,44 @@ static CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, > target_ulong addr) > return NULL; > } > > -static bool kvm_arm_pmu_support_ctrl(CPUState *cs, struct kvm_device_attr > *attr) > -{ > -return kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr) == 0; > -} > - > -int kvm_arm_pmu_create(CPUState *cs, int irq) > +static bool kvm_arm_pmu_set_attr(CPUState *cs, struct kvm_device_attr *attr) > { > int err; > > -struct kvm_device_attr attr = { > -.group = KVM_ARM_VCPU_PMU_V3_CTRL, > -.addr = (intptr_t), > -.attr = KVM_ARM_VCPU_PMU_V3_IRQ, > -.flags = 0, > -}; > - > -if (!kvm_arm_pmu_support_ctrl(cs, )) { > -return 0; > +err = kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr); > +if (err != 0) { > +return false; > } > > -err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, ); > +err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, attr); > if (err < 0) { > fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n", > strerror(-err)); > abort(); > } > > -attr.group = KVM_ARM_VCPU_PMU_V3_CTRL; > -attr.attr = KVM_ARM_VCPU_PMU_V3_INIT; > -attr.addr = 0; > -attr.flags = 0; > +return true; > +} > > -err = kvm_vcpu_ioctl(cs, KVM_SET_DEVICE_ATTR, ); > -if (err < 0) { > -fprintf(stderr, "KVM_SET_DEVICE_ATTR failed: %s\n", > -strerror(-err)); > -abort(); > -} > +int kvm_arm_pmu_init(CPUState *cs) > +{ > +struct kvm_device_attr attr = { > +.group = KVM_ARM_VCPU_PMU_V3_CTRL, > +.attr = KVM_ARM_VCPU_PMU_V3_INIT, > +}; > + > +return kvm_arm_pmu_set_attr(cs, ); > +} > + > +int kvm_arm_pmu_set_irq(CPUState *cs, int irq) > +{ > +struct kvm_device_attr attr = { > +.group = KVM_ARM_VCPU_PMU_V3_CTRL, > +.addr = (intptr_t), > +.attr = KVM_ARM_VCPU_PMU_V3_IRQ, > +}; > > -return 1; > +return kvm_arm_pmu_set_attr(cs, ); > } > > static inline void set_feature(uint64_t *features, int feature) > diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h > index 633d08828a5d..cab5ea9be55c 100644 > --- a/target/arm/kvm_arm.h > +++ b/target/arm/kvm_arm.h > @@ -195,7 +195,8 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu); > > int kvm_a
Re: [Qemu-devel] [PATCH v2 1/4] hw/arm/virt: add pmu interrupt state
On Wed, Jul 19, 2017 at 09:39:54AM -0400, Andrew Jones wrote: > Mimicking gicv3-maintenance-interrupt, add the PMU's interrupt to > CPU state. > > Signed-off-by: Andrew Jones> Reviewed-by: Peter Maydell > --- > hw/arm/virt.c| 3 +++ > target/arm/cpu.c | 2 ++ > target/arm/cpu.h | 2 ++ > 3 files changed, 7 insertions(+) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 31739d75a3e0..ea26f0c473c2 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -610,6 +610,9 @@ static void create_gic(VirtMachineState *vms, qemu_irq > *pic) > qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0, > qdev_get_gpio_in(gicdev, ppibase > + > ARCH_GICV3_MAINT_IRQ)); > +qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0, > +qdev_get_gpio_in(gicdev, ppibase > + + VIRTUAL_PMU_IRQ)); I know Peter reviewed this, but isn't it a bit strange to create the pmu-interrupt when creating the gic (as this isn't an output from the GIC like the maintenance interrupt is) ? Thanks, -Christoffer > > sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, > ARM_CPU_IRQ)); > sysbus_connect_irq(gicbusdev, i + smp_cpus, > diff --git a/target/arm/cpu.c b/target/arm/cpu.c > index 96d1f840301f..fd82c7944840 100644 > --- a/target/arm/cpu.c > +++ b/target/arm/cpu.c > @@ -499,6 +499,8 @@ static void arm_cpu_initfn(Object *obj) > > qdev_init_gpio_out_named(DEVICE(cpu), >gicv3_maintenance_interrupt, > "gicv3-maintenance-interrupt", 1); > +qdev_init_gpio_out_named(DEVICE(cpu), >pmu_interrupt, > + "pmu-interrupt", 1); > #endif > > /* DTB consumers generally don't in fact care what the 'compatible' > diff --git a/target/arm/cpu.h b/target/arm/cpu.h > index 102c58afac52..8d91166eb97b 100644 > --- a/target/arm/cpu.h > +++ b/target/arm/cpu.h > @@ -584,6 +584,8 @@ struct ARMCPU { > qemu_irq gt_timer_outputs[NUM_GTIMERS]; > /* GPIO output for GICv3 maintenance interrupt signal */ > qemu_irq gicv3_maintenance_interrupt; > +/* GPIO output for the PMU interrupt */ > +qemu_irq pmu_interrupt; > > /* MemoryRegion to use for secure physical accesses */ > MemoryRegion *secure_memory; > -- > 1.8.3.1 >
[Qemu-devel] [job ad] QEMU Engineering Position with Linaro
Hi all, Linaro is looking for a full-time engineer to work on QEMU, primarily focused on emulating aspects of the ARM architecture. Feel free to apply to the position or forward this to anyone you know who has experience working with QEMU and experience with modeling computer architecture. The official job opening is posted here: http://www.linaro.org/careers/#op-22576-qemu-engineer- Thanks, -Christoffer
Re: [Qemu-devel] [PATCH v3 2/3] arm64: kvm: inject SError with virtual syndrome
On Sun, Apr 30, 2017 at 01:37:56PM +0800, Dongjiu Geng wrote: > when SError happen, kvm notifies kvmtool to generate GHES table > to record the error, then kvmtools inject the SError with specified again, is this really specific to kvmtool? Pleae try to explain this mechanism in generic terms. > virtual syndrome. when switch to guest, a virtual SError will happen with > this specified syndrome. > > Signed-off-by: Dongjiu Geng> --- > arch/arm64/include/asm/esr.h | 2 ++ > arch/arm64/include/asm/kvm_emulate.h | 10 ++ > arch/arm64/include/asm/kvm_host.h| 1 + > arch/arm64/include/asm/sysreg.h | 3 +++ > arch/arm64/kvm/handle_exit.c | 25 +++-- > arch/arm64/kvm/hyp/switch.c | 15 ++- > include/uapi/linux/kvm.h | 5 + > 7 files changed, 54 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h > index 22f9c90..d009c99 100644 > --- a/arch/arm64/include/asm/esr.h > +++ b/arch/arm64/include/asm/esr.h > @@ -127,6 +127,8 @@ > #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0) > #define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1) > > +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1) > + > /* ESR value templates for specific events */ > > /* BRK instruction trap from AArch64 state */ > diff --git a/arch/arm64/include/asm/kvm_emulate.h > b/arch/arm64/include/asm/kvm_emulate.h > index f5ea0ba..a3259a9 100644 > --- a/arch/arm64/include/asm/kvm_emulate.h > +++ b/arch/arm64/include/asm/kvm_emulate.h > @@ -148,6 +148,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu > *vcpu) > return vcpu->arch.fault.esr_el2; > } > > +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu) > +{ > + return vcpu->arch.fault.vsesr_el2; > +} > + > +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long > val) > +{ > + vcpu->arch.fault.vsesr_el2 = val; > +} > + > static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu) > { > u32 esr = kvm_vcpu_get_hsr(vcpu); > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index e7705e7..84ed239 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info { > u32 esr_el2;/* Hyp Syndrom Register */ > u64 far_el2;/* Hyp Fault Address Register */ > u64 hpfar_el2; /* Hyp IPA Fault Address Register */ > + u32 vsesr_el2; /* Virtual SError Exception Syndrome Register */ > }; > > /* > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index 32964c7..b6afb7a 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -125,6 +125,9 @@ > #define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4) > #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3) > > +#define VSESR_EL2sys_reg(3, 4, 5, 2, 3) > + > + > #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM | > \ > (!!x)<<8 | 0x1f) > #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM | > \ > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c > index c89d83a..3d024a9 100644 > --- a/arch/arm64/kvm/handle_exit.c > +++ b/arch/arm64/kvm/handle_exit.c > @@ -180,7 +180,11 @@ static exit_handle_fn kvm_get_exit_handler(struct > kvm_vcpu *vcpu) > > static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run) > { > - unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > + unsigned long hva, fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > + struct kvm_memory_slot *memslot; > + int hsr, ret = 1; > + bool writable; > + gfn_t gfn; > > if (handle_guest_sei((unsigned long)fault_ipa, > kvm_vcpu_get_hsr(vcpu))) { > @@ -190,9 +194,20 @@ static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, > struct kvm_run *run) > (unsigned long)kvm_vcpu_get_hsr(vcpu)); > > kvm_inject_vabt(vcpu); > + } else { > + hsr = kvm_vcpu_get_hsr(vcpu); > + > + gfn = fault_ipa >> PAGE_SHIFT; > + memslot = gfn_to_memslot(vcpu->kvm, gfn); > + hva = gfn_to_hva_memslot_prot(memslot, gfn, ); > + > + run->exit_reason = KVM_EXIT_INTR; > + run->intr.syndrome_info = hsr; > + run->intr.address = hva; > + ret = 0; > } > > - return 0; > + return ret; > } > > /* > @@ -218,8 +233,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run > *run, > *vcpu_pc(vcpu) -= adj; > } > > - kvm_handle_guest_sei(vcpu, run); > - return 1; > + return kvm_handle_guest_sei(vcpu,
Re: [Qemu-devel] [PATCH v3 1/3] arm64: kvm: support kvmtool to detect RAS extension feature
Hi Dongjiu, Please send a cover letter for patch series with more than a single patch. The subject and description of these patches are also misleading. Hopefully this is in no way tied to kvmtool, but to userspace generically, for example also to be used by QEMU? On Sun, Apr 30, 2017 at 01:37:55PM +0800, Dongjiu Geng wrote: > Handle kvmtool's detection for RAS extension, because sometimes > the APP needs to know the CPU's capacity the APP ? the CPU's capacity? > > Signed-off-by: Dongjiu Geng> --- > arch/arm64/kvm/reset.c | 11 +++ > include/uapi/linux/kvm.h | 1 + > 2 files changed, 12 insertions(+) > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index d9e9697..1004039 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -64,6 +64,14 @@ static bool cpu_has_32bit_el1(void) > return !!(pfr0 & 0x20); > } > > +static bool kvm_arm_support_ras_extension(void) > +{ > + u64 pfr0; > + > + pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1); > + return !!(pfr0 & 0x1000); > +} > + > /** > * kvm_arch_dev_ioctl_check_extension > * > @@ -87,6 +95,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, > long ext) > case KVM_CAP_ARM_PMU_V3: > r = kvm_arm_support_pmu_v3(); > break; > + case KVM_CAP_ARM_RAS_EXTENSION: > + r = kvm_arm_support_ras_extension(); > + break; You need to document this capability and API in Documentation/virtual/kvm/api.txt and explain how this works. > case KVM_CAP_SET_GUEST_DEBUG: > case KVM_CAP_VCPU_ATTRIBUTES: > r = 1; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index f51d508..27fe556 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_PPC_MMU_RADIX 134 > #define KVM_CAP_PPC_MMU_HASH_V3 135 > #define KVM_CAP_IMMEDIATE_EXIT 136 > +#define KVM_CAP_ARM_RAS_EXTENSION 137 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- > 2.10.1 > Thanks, -Christoffer
Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash
On Wed, Mar 29, 2017 at 01:51:19PM -0700, Radha Mohan wrote: > On Wed, Mar 29, 2017 at 11:56 AM, Christoffer Dall <cd...@linaro.org> wrote: > > On Tue, Mar 28, 2017 at 01:24:15PM -0700, Radha Mohan wrote: > >> On Tue, Mar 28, 2017 at 1:16 PM, Christoffer Dall <cd...@linaro.org> wrote: > >> > Hi Radha, > >> > > >> > On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote: > >> >> Hi, > >> >> I am seeing an issue with qemu-system-aarch64 when using pflash > >> >> (booting kernel via UEFI bios). > >> >> > >> >> Host kernel: 4.11.0-rc3-next-20170323 > >> >> Qemu version: v2.9.0-rc1 > >> >> > >> >> Command used: > >> >> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M > >> >> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive > >> >> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device > >> >> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash > >> >> /root/flash1.img > >> >> > >> >> > >> >> As soon as the guest kernel boots the host starts to stall and prints > >> >> the below messages. And the system never recovers. I can neither > >> >> poweroff the guest nor the host. So I have resort to external power > >> >> reset of the host. > >> >> > >> >> == > >> >> [ 116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! > >> >> [kworker/25:1:454] > >> >> [ 116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk > >> >> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce > >> >> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai > >> >> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor > >> >> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm > >> >> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s > >> >> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv > >> >> mdio_thunder mdio_cavium > >> >> > >> >> [ 116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted > >> >> 4.11.0-rc3-next-20170323 #1 > >> >> [ 116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 > >> >> Feb 23 2017 > >> >> [ 116.207010] Workqueue: events netstamp_clear > >> >> [ 116.207015] task: 801f906b5400 task.stack: 801f901a4000 > >> >> [ 116.207020] PC is at smp_call_function_many+0x284/0x2e8 > >> >> [ 116.207023] LR is at smp_call_function_many+0x244/0x2e8 > >> >> [ 116.207026] pc : [] lr : [] > >> >> pstate: 8145 > >> >> [ 116.207028] sp : 801f901a7be0 > >> >> [ 116.207030] x29: 801f901a7be0 x28: 09139000 > >> >> [ 116.207036] x27: 09139434 x26: 0080 > >> >> [ 116.207041] x25: x24: 081565d0 > >> >> [ 116.207047] x23: 0001 x22: 08e11e00 > >> >> [ 116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08 > >> >> [ 116.207057] x19: 09138e38 x18: 0a03 > >> >> [ 116.207063] x17: b77c9028 x16: 082e81d8 > >> >> [ 116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a > >> >> [ 116.207073] x13: 58dabe4c x12: 0018 > >> >> [ 116.207079] x11: 366e2f04 x10: 09f0 > >> >> [ 116.207084] x9 : 801f901a7d30 x8 : 0002 > >> >> [ 116.207089] x7 : x6 : > >> >> [ 116.207095] x5 : x4 : 0020 > >> >> [ 116.207100] x3 : 0020 x2 : > >> >> [ 116.207105] x1 : 801f6d682578 x0 : 0003 > >> >> > >> >> [ 150.443116] INFO: rcu_sched self-detected stall on CPU > >> >> [ 150.448261] 25-...: (14997 ticks this GP) > >> >> idle=47a/141/0 softirq=349/349 fqs=7495 > >> >> [ 150.451115] INFO: rcu_sched detected stalls on CPUs/tasks: > >> >> [ 150.451123] 25-...: (14997 ticks this GP) > >> >> idle=47a/141/0 softirq=349/349 fqs=7495 > >> >> [ 150.451124] (detected by 13, t=15002 jiffies, g=805, c=804, q=8384) > >> >> [ 150.
Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash
On Tue, Mar 28, 2017 at 01:24:15PM -0700, Radha Mohan wrote: > On Tue, Mar 28, 2017 at 1:16 PM, Christoffer Dall <cd...@linaro.org> wrote: > > Hi Radha, > > > > On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote: > >> Hi, > >> I am seeing an issue with qemu-system-aarch64 when using pflash > >> (booting kernel via UEFI bios). > >> > >> Host kernel: 4.11.0-rc3-next-20170323 > >> Qemu version: v2.9.0-rc1 > >> > >> Command used: > >> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M > >> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive > >> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device > >> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash > >> /root/flash1.img > >> > >> > >> As soon as the guest kernel boots the host starts to stall and prints > >> the below messages. And the system never recovers. I can neither > >> poweroff the guest nor the host. So I have resort to external power > >> reset of the host. > >> > >> == > >> [ 116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! > >> [kworker/25:1:454] > >> [ 116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk > >> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce > >> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai > >> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor > >> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm > >> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s > >> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv > >> mdio_thunder mdio_cavium > >> > >> [ 116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted > >> 4.11.0-rc3-next-20170323 #1 > >> [ 116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb > >> 23 2017 > >> [ 116.207010] Workqueue: events netstamp_clear > >> [ 116.207015] task: 801f906b5400 task.stack: 801f901a4000 > >> [ 116.207020] PC is at smp_call_function_many+0x284/0x2e8 > >> [ 116.207023] LR is at smp_call_function_many+0x244/0x2e8 > >> [ 116.207026] pc : [] lr : [] > >> pstate: 8145 > >> [ 116.207028] sp : 801f901a7be0 > >> [ 116.207030] x29: 801f901a7be0 x28: 09139000 > >> [ 116.207036] x27: 09139434 x26: 0080 > >> [ 116.207041] x25: x24: 081565d0 > >> [ 116.207047] x23: 0001 x22: 08e11e00 > >> [ 116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08 > >> [ 116.207057] x19: 09138e38 x18: 0a03 > >> [ 116.207063] x17: b77c9028 x16: 082e81d8 > >> [ 116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a > >> [ 116.207073] x13: 58dabe4c x12: 0018 > >> [ 116.207079] x11: 366e2f04 x10: 09f0 > >> [ 116.207084] x9 : 801f901a7d30 x8 : 0002 > >> [ 116.207089] x7 : x6 : > >> [ 116.207095] x5 : x4 : 0020 > >> [ 116.207100] x3 : 0020 x2 : > >> [ 116.207105] x1 : 801f6d682578 x0 : 0003 > >> > >> [ 150.443116] INFO: rcu_sched self-detected stall on CPU > >> [ 150.448261] 25-...: (14997 ticks this GP) > >> idle=47a/141/0 softirq=349/349 fqs=7495 > >> [ 150.451115] INFO: rcu_sched detected stalls on CPUs/tasks: > >> [ 150.451123] 25-...: (14997 ticks this GP) > >> idle=47a/141/0 softirq=349/349 fqs=7495 > >> [ 150.451124] (detected by 13, t=15002 jiffies, g=805, c=804, q=8384) > >> [ 150.451136] Task dump for CPU 25: > >> [ 150.451138] kworker/25:1R running task0 454 2 > >> 0x0002 > >> [ 150.451155] Workqueue: events netstamp_clear > >> [ 150.451158] Call trace: > >> [ 150.451164] [] __switch_to+0x90/0xa8 > >> [ 150.451172] [] static_key_slow_inc+0x128/0x138 > >> [ 150.451175] [] static_key_enable+0x34/0x60 > >> [ 150.451178] [] netstamp_clear+0x68/0x80 > >> [ 150.451181] [] process_one_work+0x158/0x478 > >> [ 150.451183] [] worker_thread+0x50/0x4a8 > >> [ 150.451187] [] kthread+0x108/0x138 > >> [ 150.451190] [] ret_from_fork+0x10/0x50 > >> [ 150.477451] (t=15008 jiffies g=805 c=804 q=8384) > >> [ 150.48224
Re: [Qemu-devel] [PATCH] kvm: pass the virtual SEI syndrome to guest OS
On Wed, Mar 29, 2017 at 05:37:49PM +0200, Laszlo Ersek wrote: > On 03/29/17 16:48, Christoffer Dall wrote: > > On Wed, Mar 29, 2017 at 10:36:51PM +0800, gengdongjiu wrote: > >> 2017-03-29 18:36 GMT+08:00, Achin Gupta <achin.gu...@arm.com>: > > >>> Qemu is essentially fulfilling the role of secure firmware at the > >>> EL2/EL1 interface (as discussed with Christoffer below). So it > >>> should generate the CPER before injecting the error. > >>> > >>> This is corresponds to (1) above apart from notifying UEFI (I am > >>> assuming you mean guest UEFI). At this time, the guest OS already > >>> knows where to pick up the CPER from through the HEST. Qemu has > >>> to create the CPER and populate its address at the address > >>> exported in the HEST. Guest UEFI should not be involved in this > >>> flow. Its job was to create the HEST at boot and that has been > >>> done by this stage. > >> > >> Sorry, As I understand it, after Qemu generate the CPER table, it > >> should pass the CPER table to the guest UEFI, then Guest UEFI place > >> this CPER table to the guest OS memory. In this flow, the Guest UEFI > >> should be involved, else the Guest OS can not see the CPER table. > >> > > > > I think you need to explain the "pass the CPER table to the guest UEFI" > > concept in terms of what really happens, step by step, and when you say > > "then Guest UEFI place the CPER table to the guest OS memory", I'm > > curious who is running what code on the hardware when doing that. > > I strongly suggest to keep the guest firmware's runtime involvement to > zero. Two reasons: > > (1) As you explained above (... which I conveniently snipped), when you > inject an interrupt to the guest, the handler registered for that > interrupt will come from the guest kernel. > > The only exception to this is when the platform provides a type of > interrupt whose handler can be registered and then locked down by the > firmware. On x86, this is the SMI. > > In practice though, > - in OVMF (x86), we only do synchronous (software-initiated) SMIs (for > privileged UEFI varstore access), > - and in ArmVirtQemu (ARM / aarch64), none of the management mode stuff > exists at all. > > I understand that the Platform Init 1.5 (or 1.6?) spec abstracted away > the MM (management mode) protocols from Intel SMM, but at this point > there is zero code in ArmVirtQemu for that. (And I'm unsure how much of > any eligible underlying hw emulation exists in QEMU.) > > So you can't get the guest firmware to react to the injected interrupt > without the guest OS coming between first. > > (2) Achin's description matches really-really closely what is possible, > and what should be done with QEMU, ArmVirtQemu, and the guest kernel. > > In any solution for this feature, the firmware has to reserve some > memory from the OS at boot. The current facilities we have enable this. > As I described previously, the ACPI linker/loader actions can be mapped > more or less 1:1 to Achin's design. From a practical perspective, you > really want to keep the guest firmware as dumb as possible (meaning: as > generic as possible), and keep the ACPI specifics to the QEMU and the > guest kernel sides. > > The error serialization actions -- the co-operation between guest kernel > and QEMU on the special memory areas -- that were mentioned earlier by > Michael and Punit look like a complication. But, IMO, they don't differ > from any other device emulation -- DMA actions in particular -- that > QEMU already does. Device models are what QEMU *does*. Read the command > block that the guest driver placed in guest memory, parse it, sanity > check it, verify it, execute it, write back the status code, inject an > interrupt (and/or let any polling guest driver notice it "soon after" -- > use barriers as necessary). > > Thus, I suggest to rely on the generic ACPI linker/loader interface > (between QEMU and guest firmware) *only* to make the firmware lay out > stuff (= reserve buffers, set up pointers, install QEMU's ACPI tables) > *at boot*. Then, at runtime, let the guest kernel and QEMU (the "device > model") talk to each other directly. Keep runtime firmware involvement > to zero. > > You *really* don't want to debug three components at runtime, when you > can solve the thing with two. (Two components whose build systems won't > drive you mad, I should add.) > > IMO, Achin's design nailed it. We can do that. > I completely agree. My questions were intended for gengdongjiu to clarify his/her position and clear up any misunderstandings between what Achin suggested and what he/she wrote. Thanks, -Christoffer
Re: [Qemu-devel] [PATCH] kvm: pass the virtual SEI syndrome to guest OS
On Wed, Mar 29, 2017 at 10:36:51PM +0800, gengdongjiu wrote: > Hi Achin, > Thanks for your mail and answer. > > 2017-03-29 18:36 GMT+08:00, Achin Gupta: > > Hi gengdongjiu, > > > > On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote: > >> > >> Hi Laszlo/Biesheuvel/Qemu developer, > >> > >>Now I encounter a issue and want to consult with you in ARM64 platform, > >> as described below: > >> > >>when guest OS happen synchronous or asynchronous abort, kvm needs to > >> send the error address to Qemu or UEFI through sigbus to dynamically > >> generate APEI table. from my investigation, there are two ways: > >> > >>(1) Qemu get the error address, and generate the APEI table, then > >> notify UEFI to know this generation, then inject abort error to guest OS, > >> guest OS read the APEI table. > >>(2) Qemu get the error address, and let UEFI to generate the APEI > >> table, then inject abort error to guest OS, guest OS read the APEI table. > > The description may be not precise, I update it > >(1) Qemu get the error address, and generate the CPER table, then > notify guest UEFI to place this CPER table to Guest OS memory, then > Qemu let KVM inject abort error to guest OS, guest OS read the CPER > table. > > (2) Qemu get the error address, and let guest UEFI to directly > generate the CPER > table and place this table to the guest OS memory, not let Qemu gerate > it. then KVM inject abort error to guest OS, guest OS read the CPER > table. > I don't understand how you are going to notify the guest UEFI instance of anything or run the guest UEFI instance without going through the guest kernel. AFAIU, the guest UEFI instance is just a blob of software running at EL1 together with the guest kernel, and you're notification mechanism from QEMU to the VM is pretty much limited to injecting a virtual interrupt (of some kind) which is initially handled by the guest kernel. > > > > Just being pedantic! I don't think we are talking about creating the APEI > > table > > dynamically here. The issue is: Once KVM has received an error that is > > destined > > for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the > > error > > into the guest OS, a CPER (Common Platform Error Record) has to be > > generated > > corresponding to the error source (GHES corresponding to memory subsystem, > > processor etc) to allow the guest OS to do anything meaningful with the > > error. So who should create the CPER is the question. > > > > At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error > > arrives > > at EL3 and secure firmware (at EL3 or a lower secure exception level) is > > responsible for creating the CPER. ARM is experimenting with using a > > Standalone > > MM EDK2 image in the secure world to do the CPER creation. This will avoid > > adding the same code in ARM TF in EL3 (better for security). The error will > > then > > be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM > > Trusted > > Firmware. > > > > Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1 > > interface (as discussed with Christoffer below). So it should generate the > > CPER > > before injecting the error. > > > > This is corresponds to (1) above apart from notifying UEFI (I am assuming > > you > > mean guest UEFI). At this time, the guest OS already knows where to pick up > > the > > CPER from through the HEST. Qemu has to create the CPER and populate its > > address > > at the address exported in the HEST. Guest UEFI should not be involved in > > this > > flow. Its job was to create the HEST at boot and that has been done by this > > stage. > > Sorry, As I understand it, after Qemu generate the CPER table, it > should pass the CPER table to the guest UEFI, then Guest UEFI place > this CPER table to the guest OS memory. In this flow, the Guest UEFI > should be involved, else the Guest OS can not see the CPER table. > I think you need to explain the "pass the CPER table to the guest UEFI" concept in terms of what really happens, step by step, and when you say "then Guest UEFI place the CPER table to the guest OS memory", I'm curious who is running what code on the hardware when doing that. Thanks, -Christoffer
Re: [Qemu-devel] host stalls when qemu-system-aarch64 with kvm and pflash
Hi Radha, On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote: > Hi, > I am seeing an issue with qemu-system-aarch64 when using pflash > (booting kernel via UEFI bios). > > Host kernel: 4.11.0-rc3-next-20170323 > Qemu version: v2.9.0-rc1 > > Command used: > ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M > virt,gic_version=3 -nographic -smp 1 -m 2048 -drive > if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device > virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash > /root/flash1.img > > > As soon as the guest kernel boots the host starts to stall and prints > the below messages. And the system never recovers. I can neither > poweroff the guest nor the host. So I have resort to external power > reset of the host. > > == > [ 116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! > [kworker/25:1:454] > [ 116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk > shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce > sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai > d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm > drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s > ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv > mdio_thunder mdio_cavium > > [ 116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted > 4.11.0-rc3-next-20170323 #1 > [ 116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 23 > 2017 > [ 116.207010] Workqueue: events netstamp_clear > [ 116.207015] task: 801f906b5400 task.stack: 801f901a4000 > [ 116.207020] PC is at smp_call_function_many+0x284/0x2e8 > [ 116.207023] LR is at smp_call_function_many+0x244/0x2e8 > [ 116.207026] pc : [] lr : [] > pstate: 8145 > [ 116.207028] sp : 801f901a7be0 > [ 116.207030] x29: 801f901a7be0 x28: 09139000 > [ 116.207036] x27: 09139434 x26: 0080 > [ 116.207041] x25: x24: 081565d0 > [ 116.207047] x23: 0001 x22: 08e11e00 > [ 116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08 > [ 116.207057] x19: 09138e38 x18: 0a03 > [ 116.207063] x17: b77c9028 x16: 082e81d8 > [ 116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a > [ 116.207073] x13: 58dabe4c x12: 0018 > [ 116.207079] x11: 366e2f04 x10: 09f0 > [ 116.207084] x9 : 801f901a7d30 x8 : 0002 > [ 116.207089] x7 : x6 : > [ 116.207095] x5 : x4 : 0020 > [ 116.207100] x3 : 0020 x2 : > [ 116.207105] x1 : 801f6d682578 x0 : 0003 > > [ 150.443116] INFO: rcu_sched self-detected stall on CPU > [ 150.448261] 25-...: (14997 ticks this GP) > idle=47a/141/0 softirq=349/349 fqs=7495 > [ 150.451115] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 150.451123] 25-...: (14997 ticks this GP) > idle=47a/141/0 softirq=349/349 fqs=7495 > [ 150.451124] (detected by 13, t=15002 jiffies, g=805, c=804, q=8384) > [ 150.451136] Task dump for CPU 25: > [ 150.451138] kworker/25:1R running task0 454 2 > 0x0002 > [ 150.451155] Workqueue: events netstamp_clear > [ 150.451158] Call trace: > [ 150.451164] [] __switch_to+0x90/0xa8 > [ 150.451172] [] static_key_slow_inc+0x128/0x138 > [ 150.451175] [] static_key_enable+0x34/0x60 > [ 150.451178] [] netstamp_clear+0x68/0x80 > [ 150.451181] [] process_one_work+0x158/0x478 > [ 150.451183] [] worker_thread+0x50/0x4a8 > [ 150.451187] [] kthread+0x108/0x138 > [ 150.451190] [] ret_from_fork+0x10/0x50 > [ 150.477451] (t=15008 jiffies g=805 c=804 q=8384) > [ 150.482242] Task dump for CPU 25: > [ 150.482245] kworker/25:1R running task0 454 2 > 0x0002 > [ 150.482259] Workqueue: events netstamp_clear > [ 150.482264] Call trace: > [ 150.482271] [] dump_backtrace+0x0/0x2b0 > [ 150.482277] [] show_stack+0x24/0x30 > [ 150.482281] [] sched_show_task+0x128/0x178 > [ 150.482285] [] dump_cpu_task+0x48/0x58 > [ 150.482288] [] rcu_dump_cpu_stacks+0xa0/0xe8 > [ 150.482297] [] rcu_check_callbacks+0x774/0x938 > [ 150.482305] [] update_process_times+0x34/0x60 > [ 150.482314] [] tick_sched_handle.isra.7+0x38/0x70 > [ 150.482319] [] tick_sched_timer+0x4c/0x98 > [ 150.482324] [] __hrtimer_run_queues+0xd8/0x2b8 > [ 150.482328] [] hrtimer_interrupt+0xa8/0x228 > [ 150.482334] [] arch_timer_handler_phys+0x3c/0x50 > [ 150.482341] [] handle_percpu_devid_irq+0x8c/0x230 > [ 150.482344] [] generic_handle_irq+0x34/0x50 > [ 150.482347] [] __handle_domain_irq+0x68/0xc0 > [ 150.482351] [] gic_handle_irq+0xc4/0x170 > [ 150.482356] Exception stack(0x801f901a7ab0 to 0x801f901a7be0) > [ 150.482360] 7aa0: > 0003 801f6d682578 > [ 150.482364] 7ac0:
Re: [Qemu-devel] KVM/QEMU on Raspberry Pi 3
On Thu, Feb 2, 2017 at 3:59 PM, Marc Zyngierwrote: > [+Christoffer] > > Hi Pekka, > > On 02/02/17 14:44, Pekka Enberg wrote: >> Hi, >> >> Has anyone been able to successfully run QEMU/KVM under Raspberry Pi 3? >> >> I have installed 64-bit Fedora 24 by Gerd Hoffmann on the hardware: >> >>https://www.kraxel.org/blog/2016/04/fedora-on-raspberry-pi-updates/ >> >> and built a VM image using virt-builder: >> >>virt-builder --root-password password:root --arch aarch64 fedora-24 >> >> I also built the latest UEFI for QEMU from sources: >> >>https://wiki.linaro.org/LEG/UEFIforQEMU >> >> and updated to QEMU 2.8.0 from rawhide: >> >>[root@fedora-rpi2 ~]# qemu-system-aarch64 -version >>QEMU emulator version 2.8.0(qemu-2.8.0-1.fc26) >>Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers >> >> The VM image should be fine because I’m able to boot to it under CPU >> emulation: >> >> qemu-system-aarch64 \ >>-nographic \ >>-M virt \ >>-cpu cortex-a57 \ >>-smp 1 \ >>-m 512 \ >>-bios QEMU_EFI.fd \ >>-device virtio-blk-device,drive=image -drive >> if=none,id=image,file=fedora-24.img \ >>-netdev bridge,id=hn0,br=virbr0 -device >> virtio-net-pci,netdev=hn0,romfile= \ >>-device virtio-rng-pci >> >> However, when I enable KVM, keyboard stops working (interrupt delivery >> issue?) and Fedora boot process hangs at random places before reaching >> login: > > [snip] > > TL;DR: as it stands now, none of the RPi{2,3} can run KVM out of the > box, as they lack a virtualization capable interrupt controller. This > means that timer interrupts cannot be reported to the core, and things > die a painful death. > > The longer story: we have a set of patches somewhere on the list that do > enable the timer interrupts to be reported to userspace (QEMU), which > can then inject them into its on GIC emulation and kick the vcpu. So > far, work on this seems to have stopped (API issues? QEMU patches?) > > Christoffer was about to revive the kernel patches, but we need someone > to pick up the QEMU part, and work with us and the QEMU people so that > we agree once and for all on the ABI to signal PPIs to userspace. Interesting timing. I just revived the patches today, but I reworked the ABI slightly and rebased the work on top of support for the physical timer in the guest and ran into two issues: First, turning of the in-kernel irqchip no longer works, and I also lost track of how the patches should look, so indeed I need help from a QEMU person to look at that. Second, there is some sort of regression on the 32-bit side using the physical timer patches. Once I've sorted some of this out, I can send out the patch series. At least I want to figure out the 32-bit breakage first, and then I may simply send out the patches with a big fat UNTESTED warning in hope that someone will work on the qemu side with me. Thanks, -Christoffer
Re: [Qemu-devel] [PATCH 4/4] hw/arm/virt: Don't incorrectly claim architectural timer to be edge-triggered
On Fri, Dec 09, 2016 at 04:30:20PM +, Peter Maydell wrote: > The architectural timers in ARM CPUs all have level triggered interrupts > (unless you're using KVM on a host kernel before 4.4, which misimplemented > them as edge-triggered). > > We were incorrectly describing them in the device tree as edge triggered. > This can cause problems for guest kernels in 4.8 before rc6: > * pre-4.8 kernels ignore the values in the DT > * 4.8 before rc6 write the DT values to the GIC config registers > * newer than rc6 ignore the DT and insist that the timer interrupts >are level triggered regardless > > Fix the DT so we're describing reality. For backwards-compatibility > purposes, only do this for the virt-2.9 machine onward. > > Signed-off-by: Peter Maydell> --- > hw/arm/virt.c | 34 ++ > 1 file changed, 30 insertions(+), 4 deletions(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 54498ea..2ca9527 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -71,6 +71,7 @@ typedef struct { > bool disallow_affinity_adjustment; > bool no_its; > bool no_pmu; > +bool claim_edge_triggered_timers; > } VirtMachineClass; > > typedef struct { > @@ -309,12 +310,31 @@ static void fdt_add_psci_node(const VirtMachineState > *vms) > > static void fdt_add_timer_nodes(const VirtMachineState *vms, int gictype) > { > -/* Note that on A15 h/w these interrupts are level-triggered, > - * but for the GIC implementation provided by both QEMU and KVM > - * they are edge-triggered. > +/* On real hardware these interrupts are level-triggered. > + * On KVM they were edge-triggered before host kernel version 4.4, > + * and level-triggered afterwards. > + * On emulated QEMU they are level-triggered. > + * > + * Getting the DTB info about them wrong is awkward for some > + * guest kernels: > + * pre-4.8 ignore the DT and leave the interrupt configured > + * with whatever the GIC reset value (or the bootloader) left it at > + * 4.8 before rc6 honour the incorrect data by programming it back > + * into the GIC, causing problems > + * 4.8rc6 and later ignore the DT and always write "level triggered" > + * into the GIC > + * > + * For backwards-compatibility, virt-2.8 and earlier will continue > + * to say these are edge-triggered, but later machines will report > + * the correct information. > */ Is this really necessary? I don't think the KVM GIC implementation ever listened to the guest in terms of how to configure PPIs, but instead ignores writes to the config registers for these interrupts (which I think the GIC architecture allows). So this would only be a matter of how the guest kernel between v4.8-rc1 and v4.8-rc6 expects the behavior to be. Does the arch timer driver really do something different in how it deals with interrupts based on this DT value? Of course, I suppose we could also be running other guests (UEFI?) but again, if the KVM GIC doesn't care about how the guest tries to program it, can it make a difference? > ARMCPU *armcpu; > -uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI; > +VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); > +uint32_t irqflags = GIC_FDT_IRQ_FLAGS_LEVEL_HI; > + > +if (vmc->claim_edge_triggered_timers) { > +irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI; > +} > > if (gictype == 2) { > irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, > @@ -1556,8 +1576,14 @@ static void virt_2_8_instance_init(Object *obj) > > static void virt_machine_2_8_options(MachineClass *mc) > { > +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc)); > + > virt_machine_2_9_options(mc); > SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_8); > +/* For 2.8 and earlier we falsely claimed in the DT that > + * our timers were edge-triggered, not level-triggered. > + */ > +vmc->claim_edge_triggered_timers = true; > } > DEFINE_VIRT_MACHINE(2, 8) > I don't understand this virt machine class version stuff. In which case is the claim_edge_triggered_timers set to true? (ok, appears to be when a 2.8 machine is created, but does that happen automatically or does the user specifically have to ask for it?) Thanks, -Christoffer
Re: [Qemu-devel] [PATCH v6 1/4] kernel: Add definitions for GICv3 attributes
On Fri, Nov 25, 2016 at 02:12:12PM +0530, Vijay Kilari wrote: > On Fri, Nov 25, 2016 at 1:27 PM, Auger Ericwrote: > > Hi Vijay, > > > > On 23/11/2016 13:39, vijay.kil...@gmail.com wrote: > >> From: Vijaya Kumar K > >> > >> This temporary patch adds kernel API definitions. Use proper header update > >> procedure after these features are released. > > > > Did you send the complete v6 series? I only see 1/4 and 4/4 of this v6 > > (https://lists.gnu.org/archive/html/qemu-devel/2016-11/threads.html#04318)? > > Did I miss something? > > Strange!. Yes, I have sent complete series. > I am seeing the whole thing. Thanks, -Christoffer
Re: [Qemu-devel] [RFC 2/2] ARM: KVM: Enable in-kernel timers with user space gic
On Wed, Nov 02, 2016 at 04:40:35PM +0100, Alexander Graf wrote: > On 11/01/2016 12:35 PM, Peter Maydell wrote: > >On 29 October 2016 at 22:10, Alexander Grafwrote: [...] > > > >>+cpu->timer_irq_level = run->s.regs.timer_irq_level; > >>+} > >>+ > >> return MEMTXATTRS_UNSPECIFIED; > >> } > >Does this code do the right thing across a vcpu reset or > >a full-system reset? > > Good question. I'm not 100% sure - but I don't know for sure whether > it's guaranteed without user space irqchip even. > > In essence, the code above merely synchronizes kvm state to qemu > state and is fully unaffected from any reset sequence. This is good, > as the line status is transient. So from a QEMU pov, we really only > copy the state of the vcpu interrupt line into the QEMU interrupt > line. Pulling that line down would be responsibility of the > KVM_ARM_VCPU_INIT ioctl if it also clears the timer registers I > guess. > > However, I don't see any clearing of cntv_ctrl inside KVM or from > QEMU. How do we ensure that the irq active bit is off on reset? In kvm_timer_vcpu_reset we cset cntv_ctl = 0, and that function gets called from the PSCI handler or whenever userspace calls the set target ioctl thingy. > > The other part that could get in the way of working system reset is > the interrupt controller emulation itself which resets all internal > irq line state. So on reset we'd always end up with the irq line > down from a gic pov, but with the vtimer line pending or not pending > depending on previous state. I doubt it's really going to hurt > though. I suppose it should resample the line, but if the GIC clears everything and the arch timer line goes down, you're in the right starting state again. Right? -Christoffer
Re: [Qemu-devel] [RFC PATCH v3 0/5] vITS support
On Tue, Jun 28, 2016 at 8:41 AM, Auger Ericwrote: > Dear all, > > On 24/11/2015 11:13, Pavel Fedin wrote: >> This series introduces support for in-kernel GICv3 ITS emulation. >> It is based on kernel API which is not released yet, therefore i post >> it as an RFC. >> >> Kernel patch sets which implement this functionality are: >> - [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation >> http://www.spinics.net/lists/kvm/msg121878.html >> - [PATCH v3 0/7] KVM: arm/arm64: gsi routing support >> http://www.spinics.net/lists/kvm/msg119567.html > > For testing KVM/ARM gsi routing and KVM ARM PCIe/MSI passthrough with > GICv3 ITS I intend to work on the respin of this series. I now have > access to a host featuring GICv3 ITS. Does anyone currently work on this > topic or intend to do so in short term or can I proceed? > Linaro is not working on this feature yet, and we would be happy to see you pick up this work. Thanks, -Christoffer
Re: [Qemu-devel] [kvm-unit-tests PATCH 00/10] arm/arm64: add gic framework
On Tue, May 24, 2016 at 02:23:43PM +0200, Andrew Jones wrote: > On Tue, May 24, 2016 at 01:58:19PM +0200, Christoffer Dall wrote: > > On Mon, May 23, 2016 at 05:24:23PM +0200, Andrew Jones wrote: > > > On Wed, May 18, 2016 at 11:07:14AM +0200, Christoffer Dall wrote: > > > > Hi Drew, > > > > > > > > Thanks for doing this. I'm happy to see some tests for the GIC. > > > > > > > > I've been pondering with how to write unit tests for all the MMIO > > > > implementations. If you have some thoughts on how that could be easily > > > > fitted into this framework, that would probably be a good place to do it > > > > ;) > > > > > > Hi Christoffer, > > > > > > Sorry for my slow response, I've been on vacation. For MMIO > > > implementations, are you referring to the emulation done for > > > gicv2 accesses and for gicv3 legacy accesses? And, if so, is > > > your question how we might be able to use the same test > > > framework for both? And, if that's so, then I think this series > > > gets us pretty close already. If I'm completely off-base, then > > > please give me a quick high-level description of what you'd like > > > to be able to do. > > > > > What I meant was testing all the MMIO accesses to the various > > distributor MMIO regions. > > > > For example, writing full words to all registers (some value) reading > > back the value, correcting for RAZ/WI semantics, and testing that byte > > accesses to those registers where that's allowed also works. > > OK, understood. We can build a table that describes each distributor > offset's allowed access types and expected read-back results for the > "default enablement" of the gic. Then, we'd run through that table > doing a refresh of the gic enabling before each offset test. This > series provides everything needed for that, except the offset table. > It should be pretty easy to add. > > Now, configuring the gic differently will result in some offsets > producing different values, so we'll eventually want to extend the > table to check the same offsets using different gic enable functions > as well, but that would be pretty easy to do too. > > > > > If adding that on top of this series sounds like a good idea, someone > > should add it to the bottom of their (presumably already long) todo > > list, myself included. > > They do sound like good tests to have. I've added it to the middle > of my long TODO. If somebody beats me to it, I won't complain :-) > Awesome, thanks! -Christoffer
Re: [Qemu-devel] [kvm-unit-tests PATCH 00/10] arm/arm64: add gic framework
On Mon, May 23, 2016 at 05:24:23PM +0200, Andrew Jones wrote: > On Wed, May 18, 2016 at 11:07:14AM +0200, Christoffer Dall wrote: > > Hi Drew, > > > > Thanks for doing this. I'm happy to see some tests for the GIC. > > > > I've been pondering with how to write unit tests for all the MMIO > > implementations. If you have some thoughts on how that could be easily > > fitted into this framework, that would probably be a good place to do it > > ;) > > Hi Christoffer, > > Sorry for my slow response, I've been on vacation. For MMIO > implementations, are you referring to the emulation done for > gicv2 accesses and for gicv3 legacy accesses? And, if so, is > your question how we might be able to use the same test > framework for both? And, if that's so, then I think this series > gets us pretty close already. If I'm completely off-base, then > please give me a quick high-level description of what you'd like > to be able to do. > What I meant was testing all the MMIO accesses to the various distributor MMIO regions. For example, writing full words to all registers (some value) reading back the value, correcting for RAZ/WI semantics, and testing that byte accesses to those registers where that's allowed also works. If adding that on top of this series sounds like a good idea, someone should add it to the bottom of their (presumably already long) todo list, myself included. Thanks, -Christoffer
Re: [Qemu-devel] [kvm-unit-tests PATCH 00/10] arm/arm64: add gic framework
Hi Drew, Thanks for doing this. I'm happy to see some tests for the GIC. I've been pondering with how to write unit tests for all the MMIO implementations. If you have some thoughts on how that could be easily fitted into this framework, that would probably be a good place to do it ;) -Christoffer On Mon, May 16, 2016 at 09:57:14AM +0200, Andrew Jones wrote: > Import defines, and steal enough helper functions, from Linux to > enable programming of the gic (v2 and v3). Then use the framework > to add an initial test (an ipi test; self, target-list, broadcast). > > It's my hope that this framework will be a suitable base on which > more tests may be easily added, particularly because we have > vgic-new and tcg gicv3 emulation getting close to merge. > > To run it, along with other tests, just do > > ./configure [ --arch=[arm|arm64] --cross-prefix=$PREFIX ] > make > export QEMU=$PATH_TO_QEMU > ./run_tests.sh > > To run it separately do, e.g. > > $QEMU -machine virt,accel=tcg -cpu cortex-a57 \ > -device virtio-serial-device \ > -device virtconsole,chardev=ctd -chardev testdev,id=ctd \ > -display none -serial stdio \ > -kernel arm/gic.flat \ > -smp 123 -machine gic-version=3 -append ipi > > ^^ note, we can go nuts with nr-cpus on TCG :-) > > Or, a KVM example using a different "sender" cpu and irq (other than zero) > > $QEMU -machine virt,accel=kvm -cpu host \ > -device virtio-serial-device \ > -device virtconsole,chardev=ctd -chardev testdev,id=ctd \ > -display none -serial stdio \ > -kernel arm/gic.flat \ > -smp 48 -machine gic-version=3 -append 'ipi sender=42 irq=1' > > > Patches: > 01-05: fixes and functionality needed by the later gic patches > 06-07: code theft from Linux (defines, helper functions) > 08-10: arm/gic.flat (the base of the gic unit test), currently just IPI > > Available here: https://github.com/rhdrjones/kvm-unit-tests/commits/arm/gic > > > Andrew Jones (10): > lib: xstr: allow multiple args > arm64: fix get_"sysreg32" and make MPIDR 64bit > arm/arm64: smp: support more than 8 cpus > arm/arm64: add some delay routines > arm/arm64: irq enable/disable > arm/arm64: add initial gicv2 support > arm64: add initial gicv3 support > arm/arm64: gicv2: add an IPI test > arm/arm64: gicv3: add an IPI test > arm/arm64: gic: don't just use zero > > arm/Makefile.common| 7 +- > arm/gic.c | 338 > + > arm/run| 19 ++- > arm/selftest.c | 5 +- > arm/unittests.cfg | 13 ++ > lib/arm/asm/arch_gicv3.h | 184 > lib/arm/asm/gic-v2.h | 74 ++ > lib/arm/asm/gic-v3.h | 320 ++ > lib/arm/asm/gic.h | 21 +++ > lib/arm/asm/processor.h| 38 - > lib/arm/asm/setup.h| 4 +- > lib/arm/gic.c | 127 + > lib/arm/processor.c| 15 ++ > lib/arm/setup.c| 12 +- > lib/arm64/asm/arch_gicv3.h | 169 +++ > lib/arm64/asm/gic-v2.h | 1 + > lib/arm64/asm/gic-v3.h | 1 + > lib/arm64/asm/gic.h| 1 + > lib/arm64/asm/processor.h | 53 +-- > lib/arm64/asm/sysreg.h | 44 ++ > lib/arm64/processor.c | 15 ++ > lib/libcflat.h | 4 +- > 22 files changed, 1439 insertions(+), 26 deletions(-) > create mode 100644 arm/gic.c > create mode 100644 lib/arm/asm/arch_gicv3.h > create mode 100644 lib/arm/asm/gic-v2.h > create mode 100644 lib/arm/asm/gic-v3.h > create mode 100644 lib/arm/asm/gic.h > create mode 100644 lib/arm/gic.c > create mode 100644 lib/arm64/asm/arch_gicv3.h > create mode 100644 lib/arm64/asm/gic-v2.h > create mode 100644 lib/arm64/asm/gic-v3.h > create mode 100644 lib/arm64/asm/gic.h > create mode 100644 lib/arm64/asm/sysreg.h > > -- > 2.4.11 >
[Qemu-devel] [PATCH v2] util: align memory allocations to 2M on AArch64
For KVM to use Transparent Huge Pages (THP) we have to ensure that the alignment of the userspace address of the KVM memory slot and the IPA that the guest sees for a memory region have the same offset from the 2M huge page size boundary. One way to achieve this is to always align the IPA region at a 2M boundary and ensure that the mmap alignment is also at 2M. Unfortunately, we were only doing this for __arm__, not for __aarch64__, so add this simply condition. This fixes a performance regression using KVM/ARM on AArch64 platforms that showed a performance penalty of more than 50%, introduced by the following commit: 9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10) We were only lucky before the above commit, because we were allocating large regions and naturally getting a 2M alignment on those allocations then. Reported-by: Shih-Wei Li <shih...@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> --- The first version of this patch was accidentally made against the v2.5.0 release instead of master, so this is a rebased version. util/oslib-posix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 20ca141..a0c5b91 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -26,7 +26,7 @@ * THE SOFTWARE. */ -#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) +#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__) || defined(__aarch64__)) /* Use 2 MiB alignment so transparent hugepages can be used by KVM. Valgrind does not support alignments larger than 1 MiB, therefore we need special code which handles running on Valgrind. */ -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH] util: align memory allocations to 2M on AArch64
For KVM to use Transparent Huge Pages (THP) we have to ensure that the alignment of the userspace address of the KVM memory slot and the IPA that the guest sees for a memory region have the same offset from the 2M huge page size boundary. One way to achieve this is to always align the IPA region at a 2M boundary and ensure that the mmap alignment is also at 2M. Unfortunately, we were only doing this for __arm__, not for __aarch64__, so add this simply condition. This fixes a performance regression using KVM/ARM on AArch64 platforms that showed a performance penalty of more than 50%, introduced by the following commit: 9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10) We were only lucky before the above commit, because we were allocating large regions and naturally getting a 2M alignment on those allocations then. Reported-by: Shih-Wei Li <shih...@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> --- util/oslib-posix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index d25f671..03b055e 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -35,7 +35,7 @@ extern int daemon(int, int); #endif -#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) +#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__) || defined(__aarch64__)) /* Use 2 MiB alignment so transparent hugepages can be used by KVM. Valgrind does not support alignments larger than 1 MiB, therefore we need special code which handles running on Valgrind. */ -- 2.1.2.330.g565301e.dirty
Re: [Qemu-devel] Performance regression using KVM/ARM
On Fri, Apr 22, 2016 at 11:17:47AM +0100, Peter Maydell wrote: > On 22 April 2016 at 11:15, Christoffer Dall <christoffer.d...@linaro.org> > wrote: > > Peter just pointed me to a change I remember doing for ARM, so perhaps > > this fix is the right one? > > > > > > diff --git a/util/oslib-posix.c b/util/oslib-posix.c > > index d25f671..a36e734 100644 > > --- a/util/oslib-posix.c > > +++ b/util/oslib-posix.c > > @@ -35,7 +35,7 @@ > > extern int daemon(int, int); > > #endif > > > > -#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) > > +#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) || > > defined(__aarch64__) > > /* Use 2 MiB alignment so transparent hugepages can be used by KVM. > >Valgrind does not support alignments larger than 1 MiB, > >therefore we need special code which handles running on Valgrind. */ > > I hadn't realised AArch64 didn't define __arm__. Your extra clause > wants to be inside the parens for the ||, not outside. > > So was the problem just that we weren't passing 2MB as the align > parameter to qemu_ram_mmap(), and if we do pass 2MB then it does the > right thing ? > Yes, that was essentially the problem. I'll send a proper patch. However, Marc pointed out, that we should probably think about improving on this in the future, because if you're running on a 64K page system and want huge pages, you have to align at a higher boundary, but I'm on the other hand not sure such a boundary is always practical on a 64K system. Which would suggest that we either need to: 1) Probe the kernel for the page size and always align to something that allows huge pages, or 2) Let the user specify an option that says it wants to be able to use THP and only then align to the huge page boundary. Not sure... -Christoffer
Re: [Qemu-devel] Performance regression using KVM/ARM
On Fri, Apr 22, 2016 at 12:06:52PM +0200, Alexander Graf wrote: > On 04/22/2016 12:01 PM, Christoffer Dall wrote: > >On Thu, Apr 21, 2016 at 09:50:05PM +0200, Alexander Graf wrote: > >> > >>On 21.04.16 18:23, Christoffer Dall wrote: > >>>Hi, > >>> > >>>Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM, > >>>2015-09-10) had the unfortunate side effect that memory slots registered > >>>with KVM no longer contain a userspace address that is aligned to a 2M > >>>boundary, causing the use of THP to fail in the kernel. > >>> > >>>I fail to see where in the QEMU code we should be asking for a 2M > >>>alignment of our memory region. Can someone help pointing me to the > >>>right place to fix this or suggest a patch? > >>> > >>>This causes a performance regssion of hackbench on KVM/ARM of about 62% > >>>compared to the workload running with THP. > >>> > >>>We have verified that this is indeed the cause of the failure by adding > >>>various prints to QEMU and the kernel, but unfortunatley my QEMU > >>>knowledge is not sufficient for me to fix it myself. > >>> > >>>Any help would be much appreciated! > >>The code changed quite heavily since I last looked at it, but could you > >>please try whether the (untested) patch below makes a difference? > >> > >> > >Unfortunately this doesn't make any difference. It feels to me like > >we're missing specifying a 2M alignemnt in QEMU somewhere, but I can't > >properly understand the links between the actual allocation, registering > >mem slots with the KVM part of QEMU, and actually setting up KVM user > >memory regions. > > > >What has to happen is that the resulting struct > >kvm_userspace_memory_region() has the same alignment offset from 2M (the > >huge page size) of the ->guest_phys_addr and ->userspace-addr fields. > > Well, I would expect that the guest address space is always very big > aligned - and definitely at least 2MB - so we're safe there. > > That means we only need to align the qemu virtual address. There > used to be a memalign() call for that, but it got replaced with > direct mmap() and then a lot of code changed on top. Looking at the > logs, I'm sure Paolo knows the answer though :) > Peter just pointed me to a change I remember doing for ARM, so perhaps this fix is the right one? diff --git a/util/oslib-posix.c b/util/oslib-posix.c index d25f671..a36e734 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -35,7 +35,7 @@ extern int daemon(int, int); #endif -#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) +#if defined(__linux__) && (defined(__x86_64__) || defined(__arm__)) || defined(__aarch64__) /* Use 2 MiB alignment so transparent hugepages can be used by KVM. Valgrind does not support alignments larger than 1 MiB, therefore we need special code which handles running on Valgrind. */ Thanks, -Christoffer
Re: [Qemu-devel] Performance regression using KVM/ARM
Hi Laszlo, On Thu, Apr 21, 2016 at 11:58:07PM +0200, Laszlo Ersek wrote: > On 04/21/16 18:23, Christoffer Dall wrote: > > Hi, > > > > Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM, > > 2015-09-10) had the unfortunate side effect that memory slots registered > > with KVM no longer contain a userspace address that is aligned to a 2M > > boundary, causing the use of THP to fail in the kernel. > > > > I fail to see where in the QEMU code we should be asking for a 2M > > alignment of our memory region. Can someone help pointing me to the > > right place to fix this or suggest a patch? > > > > This causes a performance regssion of hackbench on KVM/ARM of about 62% > > compared to the workload running with THP. > > > > We have verified that this is indeed the cause of the failure by adding > > various prints to QEMU and the kernel, but unfortunatley my QEMU > > knowledge is not sufficient for me to fix it myself. > > > > Any help would be much appreciated! > > Can you please test the attached series? > > (Note that I'm only interested in solving this problem as a productive > distraction, so if the patches don't work, or require a lot of massaging > for merging, I'll just drop them (or, preferably, give them to someone > else).) > I like your procrastination methods! Unfortunately this fix wasn't the right one either. -Christoffer
Re: [Qemu-devel] Performance regression using KVM/ARM
On Thu, Apr 21, 2016 at 09:50:05PM +0200, Alexander Graf wrote: > > > On 21.04.16 18:23, Christoffer Dall wrote: > > Hi, > > > > Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM, > > 2015-09-10) had the unfortunate side effect that memory slots registered > > with KVM no longer contain a userspace address that is aligned to a 2M > > boundary, causing the use of THP to fail in the kernel. > > > > I fail to see where in the QEMU code we should be asking for a 2M > > alignment of our memory region. Can someone help pointing me to the > > right place to fix this or suggest a patch? > > > > This causes a performance regssion of hackbench on KVM/ARM of about 62% > > compared to the workload running with THP. > > > > We have verified that this is indeed the cause of the failure by adding > > various prints to QEMU and the kernel, but unfortunatley my QEMU > > knowledge is not sufficient for me to fix it myself. > > > > Any help would be much appreciated! > > The code changed quite heavily since I last looked at it, but could you > please try whether the (untested) patch below makes a difference? > > Unfortunately this doesn't make any difference. It feels to me like we're missing specifying a 2M alignemnt in QEMU somewhere, but I can't properly understand the links between the actual allocation, registering mem slots with the KVM part of QEMU, and actually setting up KVM user memory regions. What has to happen is that the resulting struct kvm_userspace_memory_region() has the same alignment offset from 2M (the huge page size) of the ->guest_phys_addr and ->userspace-addr fields. Thanks, -Christoffer
[Qemu-devel] Performance regression using KVM/ARM
Hi, Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10) had the unfortunate side effect that memory slots registered with KVM no longer contain a userspace address that is aligned to a 2M boundary, causing the use of THP to fail in the kernel. I fail to see where in the QEMU code we should be asking for a 2M alignment of our memory region. Can someone help pointing me to the right place to fix this or suggest a patch? This causes a performance regssion of hackbench on KVM/ARM of about 62% compared to the workload running with THP. We have verified that this is indeed the cause of the failure by adding various prints to QEMU and the kernel, but unfortunatley my QEMU knowledge is not sufficient for me to fix it myself. Any help would be much appreciated! Thanks, -Christoffer
Re: [Qemu-devel] [PATCH v6 0/4] ARM: add query-gic-capabilities SMP command
Hi Peter, Stupid question: The subject says "SMP command". Did you mean "QMP command" ? Thanks, -Christoffer On Wed, Mar 23, 2016 at 01:32:29PM +0800, Peter Xu wrote: > v6 changes: > - patch 1 (squashed into patch 2) > - explain more about the following in commit message: why we need > this command, and what does the entries mean [Markus] > - explain what GIC is [Eric] > - squash this patch into patch 2 [Eric] > - patch 2 (new patch 1) > - fix "does not implement" english error [Eric] > - remove useless headers in target-arm/monitor.c [Markus] > - remove this command from whitelist [Markus, Eric] > - return dict rather than array, with single key points to the > original array results [Markus, Eric] > - patch 5 (new patch 4) > - tiny change in implementation to suite the new interface. > > v5 changes: > - patch 2: moved to target-arm/monitor.c (from target-arm/machine.c) >[Peter] > - patch 3: splitted into three patches: [all from Peter's comments] > - patch 3 (new): leverage kvm_arm_create_scratch_host_vcpu(), tiny > enhancement of old one to suite our need > - patch 4: introduce kvm_support_device() in kvm-all.c > - patch 5: do the implementation. > > v4 changes: > - all: rename query-gic-capability to query-gic-capabilities [Andrea] > - patch 3: rename helper function to kvm_support_device, make it > inline and lighter. [Drew] > > v3 changes: > - patch 2: remove func declaration, add qmp header [Drew] > - patch 3: being able to detect KVM GIC capabilities even without > kvm enabled [Andrea]: this is a little bit hacky, need some more > review on this. > > v2 changes: > - result layout change: use array and dict for the capability bits > rather than a single array of strings [Andrea/Markus] > - spelling out what GIC is in doc [Eric] > > This patch is to add ARM-specific command "query-gic-capability". > > The new command can report which kind of GIC device the host/QEMU > support. The returned result is in the form of array. > > Sample command and output: > > {"execute": "query-gic-capability"} > {"return": [{"emulated": false, "version": 3, "kernel": false}, > {"emulated": true, "version": 2, "kernel": true}]} > > Testing: > > Smoke tests on both x86 (emulated) and another moonshot ARM server. > > Peter Xu (4): > arm: qmp: add query-gic-capabilities interface > arm: enhance kvm_arm_create_scratch_host_vcpu > kvm: add kvm_support_device() helper function > arm: implement query-gic-capabilities > > include/sysemu/kvm.h | 9 + > kvm-all.c| 15 + > monitor.c| 8 + > qapi-schema.json | 50 > qmp-commands.hx | 28 > target-arm/Makefile.objs | 2 +- > target-arm/kvm.c | 10 +- > target-arm/kvm_arm.h | 6 ++-- > target-arm/monitor.c | 86 > > 9 files changed, 210 insertions(+), 4 deletions(-) > create mode 100644 target-arm/monitor.c > > -- > 2.4.3 > >
Re: [Qemu-devel] [PATCH] virt: Lift the maximum RAM limit from 30GB to 255GB
On Thu, Feb 25, 2016 at 04:51:51PM +, Peter Maydell wrote: > [Typoed the kvmarm list address; sorry... -- PMM] > > On 25 February 2016 at 12:09, Peter Maydell <peter.mayd...@linaro.org> wrote: > > The virt board restricts guests to only 30GB of RAM. This is a > > hangover from the vexpress-a15 board, and there's inherent reason did you mean "there's *no* inherent reason" ? > > for it. 30GB is smaller than you might reasonably want to provision > > a VM for on a beefy server machine. Raise the limit to 255GB. > > > > We choose 255GB because the available space we currently have > > below the 1TB boundary is up to the 512GB mark, but we don't > > want to paint ourselves into a corner by assigning it all to > > RAM. So we make half of it available for RAM, with the 256GB..512GB > > range available for future non-RAM expansion purposes. > > > > If we need to provide more RAM to VMs in the future then we need to: > > * allocate a second bank of RAM starting at 2TB and working up > > * fix the DT and ACPI table generation code in QEMU to correctly > >report two split lumps of RAM to the guest > > * fix KVM in the host kernel to allow guests with >40 bit address spaces > > > > The last of these is obviously the trickiest, but it seems > > reasonable to assume that anybody configuring a VM with a quarter > > of a terabyte of RAM will be doing it on a host with more than a > > terabyte of physical address space. > > > > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> > > --- > > CC'ing kvm-arm as a heads-up that my proposal here is to make > > the kernel devs do the heavy lifting for supporting >255GB. > > Discussion welcome on whether I have the tradeoffs here right. I think so, this looks good to me. > > --- > > hw/arm/virt.c | 21 +++-- > > 1 file changed, 19 insertions(+), 2 deletions(-) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 44bbbea..7a56b46 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -95,6 +95,23 @@ typedef struct { > > #define VIRT_MACHINE_CLASS(klass) \ > > OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE) > > > > +/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means > > + * RAM can go up to the 256GB mark, leaving 256GB of the physical > > + * address space unallocated and free for future use between 256G and 512G. > > + * If we need to provide more RAM to VMs in the future then we need to: > > + * * allocate a second bank of RAM starting at 2TB and working up > > + * * fix the DT and ACPI table generation code in QEMU to correctly > > + *report two split lumps of RAM to the guest > > + * * fix KVM in the host kernel to allow guests with >40 bit address > > spaces > > + * (We don't want to fill all the way up to 512GB with RAM because > > + * we might want it for non-RAM purposes later. Conversely it seems > > + * reasonable to assume that anybody configuring a VM with a quarter > > + * of a terabyte of RAM will be doing it on a host with more than a > > + * terabyte of physical address space.) > > + */ > > +#define RAMLIMIT_GB 255 > > +#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024) > > + > > /* Addresses and sizes of our components. > > * 0..128MB is space for a flash device so we can run bootrom code such as > > UEFI. > > * 128MB..256MB is used for miscellaneous device I/O. > > @@ -130,7 +147,7 @@ static const MemMapEntry a15memmap[] = { > > [VIRT_PCIE_MMIO] = { 0x1000, 0x2eff }, > > [VIRT_PCIE_PIO] = { 0x3eff, 0x0001 }, > > [VIRT_PCIE_ECAM] = { 0x3f00, 0x0100 }, > > -[VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 }, > > +[VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES }, > > /* Second PCIe window, 512GB wide at the 512GB boundary */ > > [VIRT_PCIE_MMIO_HIGH] = { 0x800000ULL, 0x80ULL }, > > }; > > @@ -1066,7 +1083,7 @@ static void machvirt_init(MachineState *machine) > > vbi->smp_cpus = smp_cpus; > > > > if (machine->ram_size > vbi->memmap[VIRT_MEM].size) { > > -error_report("mach-virt: cannot model more than 30GB RAM"); > > +error_report("mach-virt: cannot model more than %dGB RAM", > > RAMLIMIT_GB); > > exit(1); > > } > > > > -- > > 1.9.1 Reviewed-by: Christoffer Dall <christoffer.d...@linaro.org>
Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support
On Tue, Feb 2, 2016 at 4:52 PM, Eric Blake <ebl...@redhat.com> wrote: > On 02/02/2016 07:05 AM, Christoffer Dall wrote: > >>> >>> I'm not familiar enough with libvirt, nor the use of QMP, to really argue >>> one way or another, but I find it a bit strange that we'd prefer libvirt >>> to query two entities over one. And, why should the libvirt installed on >>> a particular host prefer gicv3 as the default, just because KVM supports >>> it, even when QEMU does not? >> >> I think the assumption here is that if you install a recent libvirt you >> also install a recent QEMU. You always have the risk of things not >> working if you have too old a QEMU, right? > > Libvirt exists for providing back-compat glue. The following > combinations are supported: > > old libvirt, old qemu > new libvirt, old qemu > new libvirt, new qemu > > and it is only this combination that might require a libvirt upgrade to > work correctly: > > old libvirt, new qemu > ok, so that would mean we need to implement a QMP command to tell us which gic versions are supported for a given machine. Current possible responses are "2", "3" and "2,3" and we also need to add code to libvirt to try that QMP command, and if it doesn't exist, fall back to not specifying gic-version, using the old-qemu compatible default of providing a gicv2 to guests, and if the QMP command exists, use the newest gic-version. users can then always override this behavior by directly specifying a gic version "host", "2", or "3" in their xml file. any objections? -Christoffer
Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support
On Fri, Jan 22, 2016 at 02:44:32PM +, Daniel P. Berrange wrote: > On Wed, Jan 06, 2016 at 01:30:16PM +, Peter Maydell wrote: > > On 6 January 2016 at 12:49, Andrea Bolognaniwrote: > > > That's correct, having a QMP command that lists the values gic-version > > > can have on the current host would be just great. > > > > > > If we had that, we could validate the GIC version chosen for a guest, > > > and expose it in the capabilities XML so that higher-level tools can > > > provide a list of choices to the user. > > > > > > Please note that this QMP command would have to work regardless of the > > > machine type selected on QEMU's command line, because libvirt always > > > runs a QEMU binary with '-M none' when probing its capabilities. > > > > On the other hand, if you don't tell us the machine type you care > > about then we can't tell you: > > (a) "this machine type doesn't support setting this property at all" > > (which applies to machines like vexpress-a15 which you can use with > > KVM on 32-bit hosts, and of course also to all the non-KVM models) > > We have just recently merged support for registering properties against > classes instead of object instances. There is also a proposed command > to allow querying the list of properties against a class > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04348.html > > So if we now update the machine types to register their properties against > the class instead of object, then we can query what properties are present > against each machine type, while still using '-M none'. > > > (b) "this machine type only supports GIC versions X and Y even if the > > host supports more" (this is currently only hypothetical, though, > > since we only have the property on 'virt'. it would only happen > > if in the future we needed something other than '2' or '3' or > > 'host' I think.) > > Our introspection support in QOM only allows us to say that a property > is a particular type (int / enum / str / whatever). We don't have any > way to expose info about what subset of possible values for a type are > permitted. So I don't see any near term way to inform apps that the > gic property accepts values x, y and but not z. > > IMHO, we shouldn't try to overthink this. Libvirt can query the host > to find out what GIC versions are supported and default to using the > most recent version, on the basis that people are likely to have a > matching QEMU. We can just rely on QEMU to report error if we pass > it a version it doesn't support and not try to pre-emptively check > before launch. The key is just getting the default right IMHO. > This sounds fine to me. However, not being familiar with the internals of libvirt I really can't just what the right implementation approach here is. As I understand we need to either: 1) Add a QMP command that lets you ask for -M virt, if GIC version X is supported 2) Just implement something in libvirt that checks what the kernel supports directly via the well-defined KVM interface and chooses the highest supported version per default. To me it sounds like we should just go ahead with (2) and document somehwere that if you get an error, you need to either manually change the gic version setting in your VM definition file or upgrade QEMU. Can someone with libvirt authority please advice whether (1) or (2) is what we need to do? Thanks, -Christoffer
Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support
On Tue, Feb 02, 2016 at 01:59:26PM +0100, Andrew Jones wrote: > On Tue, Feb 02, 2016 at 12:10:10PM +, Daniel P. Berrange wrote: > > On Tue, Feb 02, 2016 at 12:49:33PM +0100, Christoffer Dall wrote: > > > On Fri, Jan 22, 2016 at 02:44:32PM +, Daniel P. Berrange wrote: > > > > On Wed, Jan 06, 2016 at 01:30:16PM +, Peter Maydell wrote: > > > > > On 6 January 2016 at 12:49, Andrea Bolognani <abolo...@redhat.com> > > > > > wrote: > > > > > > That's correct, having a QMP command that lists the values > > > > > > gic-version > > > > > > can have on the current host would be just great. > > > > > > > > > > > > If we had that, we could validate the GIC version chosen for a > > > > > > guest, > > > > > > and expose it in the capabilities XML so that higher-level tools can > > > > > > provide a list of choices to the user. > > > > > > > > > > > > Please note that this QMP command would have to work regardless of > > > > > > the > > > > > > machine type selected on QEMU's command line, because libvirt always > > > > > > runs a QEMU binary with '-M none' when probing its capabilities. > > > > > > > > > > On the other hand, if you don't tell us the machine type you care > > > > > about then we can't tell you: > > > > > (a) "this machine type doesn't support setting this property at all" > > > > > (which applies to machines like vexpress-a15 which you can use > > > > > with > > > > > KVM on 32-bit hosts, and of course also to all the non-KVM > > > > > models) > > > > > > > > We have just recently merged support for registering properties against > > > > classes instead of object instances. There is also a proposed command > > > > to allow querying the list of properties against a class > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04348.html > > > > > > > > So if we now update the machine types to register their properties > > > > against > > > > the class instead of object, then we can query what properties are > > > > present > > > > against each machine type, while still using '-M none'. > > > > > > > > > (b) "this machine type only supports GIC versions X and Y even if the > > > > > host supports more" (this is currently only hypothetical, > > > > > though, > > > > > since we only have the property on 'virt'. it would only happen > > > > > if in the future we needed something other than '2' or '3' or > > > > > 'host' I think.) > > > > > > > > Our introspection support in QOM only allows us to say that a property > > > > is a particular type (int / enum / str / whatever). We don't have any > > > > way to expose info about what subset of possible values for a type are > > > > permitted. So I don't see any near term way to inform apps that the > > > > gic property accepts values x, y and but not z. > > This actually doesn't matter for the v2 vs. v3 case. The gic-version property > doesn't exist at all for v2-only QEMU. Although maybe the gic-version property > should be reworked. Instead of gic-version=, we could > create > one boolean property per version supported, i.e. gicv2, gicv3, gicv4... > > > > > > > > > IMHO, we shouldn't try to overthink this. Libvirt can query the host > > > > to find out what GIC versions are supported and default to using the > > > > most recent version, on the basis that people are likely to have a > > > > matching QEMU. We can just rely on QEMU to report error if we pass > > > > it a version it doesn't support and not try to pre-emptively check > > > > before launch. The key is just getting the default right IMHO. > > > > > > > This sounds fine to me. > > > > > > However, not being familiar with the internals of libvirt I really can't > > > just what the right implementation approach here is. > > > > > > As I understand we need to either: > > > > > > 1) Add a QMP command that lets you ask for -M virt, if GIC version X > > > is supported > > > > > > 2) Just implement something in libvirt that checks what the kernel > >
[Qemu-devel] [PATCH] hw/arm/virt: Add always-on property to the virt board timer
The virt board has an arch timer, which is always on. Emit the "always-on" property to indicate to Linux that it can switch off the periodic timer and reduces the amount of interrupts injected into a guest. Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> --- hw/arm/virt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 05f9087..265fe9a 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", "arm,armv7-timer"); } +qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags, -- 2.1.2.330.g565301e.dirty
Re: [Qemu-devel] [PATCH] hw/arm/virt: Add always-on property to the virt board timer
On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > > The virt board has an arch timer, which is always on. Emit the > > "always-on" property to indicate to Linux that it can switch off the > > periodic timer and reduces the amount of interrupts injected into a > > guest. > > > > Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> > > --- > > hw/arm/virt.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 05f9087..265fe9a 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo > > *vbi, int gictype) > > qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > > "arm,armv7-timer"); > > } > > +qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > > qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, > > irqflags, > > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, > > irqflags, > > -- > > 2.1.2.330.g565301e.dirty > > > > > > Hi Christoffer, > > We should also patch the ACPI generation at the same time. I think > something like > > - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; > + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; I'm really not familiar enough with ACPI to be comfortable writing code for this or testing this. But if someone can pick this up and add the ACPI bits or can post a follow-up patch, then I'm all for it :) > > should do it. > > Also, having the guest reduce the number of interrupts sounds good. Can > you point me to something to read about how/why a guest may choose to do > that, and what the trade-offs are? > Not really, but you can ask Marc. -Christoffer
Re: [Qemu-devel] [RESEND RFC 0/6] AMD XGBE KVM platform passthrough
On Thu, Nov 19, 2015 at 11:44 PM, Alex Williamsonwrote: > On Thu, 2015-11-19 at 15:22 +, Eric Auger wrote: >> I am resending this RFC from Oct 12, after kernel 4.4-rc1 and >> QEMU 2.5-rc1, hoping things have calmed down a little bit. >> >> This RFC allows to set up AMD XGBE passthrough. This was tested on AMD >> Seattle. >> >> The first upstreamed device supporting KVM platform passthrough was the >> Calxeda Midway XGMAC. Compared to this latter, the XGBE XGMAC exposes a >> much more complex device tree node. Generating the device tree node for >> the guest is the challenging and controversary part of this series. >> >> - First There are 2 device tree node formats: >> one where XGBE and PHY are described in separate nodes and another one >> that combines both description in a single node (only supported by 4.2 >> onwards kernels). Only the combined description is supported for passthrough, >> meaning the host must be >= 4.2 and must feature a device tree with a >> combined >> description. The guest will also be exposed with a combined description, >> meaning only >= 4.2 guest are supported. It is not planned to support >> separate node representation since assignment of the PHY is less >> straigtforward. >> >> - the XGMAC/PHY node depends on 2 clock nodes (DMA and PTP). >> The code checks those clocks are fixed to make sure they cannot be >> switched off at some point after the native driver gets unbound. >> >> - there are many property values to populate on guest side. Most of them >> cannot be hardcoded. That series proposes a way to parse the host device >> tree blob and retrieve host values to feed guest representation. Current >> approach relies on dtc binary availability plus libfdt usage. >> Other alternatives were discussed in: >> http://www.spinics.net/lists/kvm-arm/msg16648.html. >> >> - Currently host booted with ACPI is not supported. > > I won't pretend to know all the politics in the ARM space, but doesn't > this last bullet sort of imply that this is dead-on-arrival code? Maybe > not in the embedded space, but certainly in the server space, I thought > ACPI was declared the winner. Thanks, > Hi Alex, We currently have no solution to accomplish platform device passthrough on ACPI, because we don't know how to extract and encapsulate the necessary information from the host about the passthrough device and its dependency (clocks and phys for example). There are people interested in this work, in particular in the networking space, who couldn't care less about ACPI, and for those, supporting platform device passthrough with DT remains highly relevant. Therefore, I think it's useful to consider these patches for review and for their basic approach. Thanks, -Christoffer
Re: [Qemu-devel] PING: [RFC PATCH 0/4] GICv3 live migration support
On Wed, Oct 07, 2015 at 09:02:47AM +0100, Peter Maydell wrote: > On 7 October 2015 at 08:57, Pavel Fedinwrote: > > Knock-knock! > > > > PM: I remember we had a talk that we should settle down on migration data > > format. Isn't it right > > time? > > I think Christoffer has a patchset which specifies the > userspace API for KVM for this, which is probably the > right place to start. Christoffer, did you manage to send > that out yet? > Very soon now. -Christoffer
Re: [Qemu-devel] add multiple times opening support to a virtserialport
On Thu, Aug 27, 2015 at 10:23:38AM -0400, Christopher Covington wrote: On 07/24/2015 08:00 AM, Matt Ma wrote: Hi all, Linaro has developed the foundation for the new Android Emulator code base based on a fairly recent upstream QEMU code base, when we re-based the code, we updated the device model to be more virtio based (for example the drives are now virtio block devices). The aim of this is to minimise the delta between upstream and the Android specific changes to QEMU. One Android emulator specific feature is the AndroidPipe. AndroidPipe is a communication channel between the guest system and the emulator itself. Guest side device node can be opened by multi processes at the same time with different service name. It has a de-multiplexer on the QEMU side to figure out which service the guest actually wanted, so the first write after opening device node is the service name guest wanted, after QEMU backend receive this service name, create a corresponding communication channel, initialize related component, such as file descriptor which connect to the host socket serve. So each opening in guest will create a separated communication channel. We can create a separate device for each service type, however some services, such as the OpenGL emulation, need to have multiple open channels at a time. This is currently not possible using the virtserialport which can only be opened once. Current virtserialport can not be opened by multiple processes at the same time. I know virtserialport has provided buffers beforehand to cache data from host to guest, so even there is no guest read, data can still be transported from host to guest kernel, when there is guest read request, just copy cached data to user space. We are not sure clearly whether virtio can support multi-open-per-device semantics or not, followings are just our initial ideas about adding multi-open-per-device feature to a port: * when there is a open request on a port, kernel will allocate a portclient with new id and __wait_queue_head to track this request * save this portclient in file-private_data * guest kernel pass this portclient info to QEMU and notify that the port has been opened * QEMU backend will create a clientinfo struct to track this communication channel, initialize related component * we may change the kernel side strategy of allocating receiving buffers in advance to a new strategy, that is when there is a read request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, that is the length of buffers chain is 2 - kick to notify QEMU backend to consume read buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then read host data directly into virtqueue buffer to avoid memcpy - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been sent to host side) - if nothing has been read from host and file descriptor is in block mode, read request will wait through __wait_queue_head until host side is readable * above read logic may change the current behavior of transferring data to guest kernel even without guest user read * when there is a write request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, the length of buffers chain is 2 - kick to notify QEMU backend to consume write buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then write the virtqueue buffer content to host side as current logic - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been receive from host side) - if nothing has been sent out and file descriptor is in block mode, write request will wait through __wait_queue_head until host side is writable We obviously don't want to regress existing virtio behaviour and performance and welcome the communities expertise to point out anything we may have missed out before we get to far down implementing our initial proof-of-concept. Hi Chris, Would virtio-vsock be interesting for your purposes? http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf (Video doesn't seem to be up yet, but should probably be available eventually at the following link) https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep Thanks for looking at this lengthy mail. Yes, we are
Re: [Qemu-devel] Created virtio-vsock wiki page
On Tue, Aug 25, 2015 at 04:43:13PM +0100, Stefan Hajnoczi wrote: I have created a wiki page for virtio-vsock. It links to my git repos and the draft virtio specification: http://qemu-project.org/Features/VirtioVsock I'll expand and update it over the coming days and weeks. Please let me know if you'd like to see specific information on there (e.g. step-by-step build QEMU invocation guide). Do you have a test suite or a simple set of guest/host test programs? -Christoffer
Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM
On Wed, Aug 12, 2015 at 4:14 PM, Eric Auger eric.au...@linaro.org wrote: Hi, On 08/12/2015 03:23 PM, Christoffer Dall wrote: On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 12 August 2015 at 13:27, Pavel Fedin p.fe...@samsung.com wrote: Hello! I still think this is the wrong approach -- see my remarks in the previous round of patch review. You know... I thought a little bit... So far, test = true in KVM_CREATE_DEVICE means that we just want to know whether this type is supported. No actual actions is done by the kernel. Is it correct? yes If yes, we can just leave this test as it is, because if it says that GICv2 is supported by KVM_CREATE_DEVICE, then: 1. We use new API. No KVM_IRQCHIP_CREATE. 2. GICv3 may be supported. Therefore, if we do this check, and it succeeds, then we just proceed, and later actually try to create GICv3. If it fails for some reason, we will see error message anyway. So would it be OK just not to touch kvm_arch_irqchip_create() at all? No, because then if the kernel only supports GICv3 the code in kvm_arch_irqchip_create() (as it is currently written) will erroneously fall back to using the old API. Christoffer: the question was, why does kvm_arch_irqchip_create() not only check the KVM_CAP_DEVICE_CTRL but also try to see if it can KVM_CREATE_DEVICE the GICv2 in order to avoid falling back to the old pre-KVM_CREATE_DEVICE API ? Are there kernels which have the capability bit set but which can't actually use KVM_CREATE_DEVICE to create the irqchip? My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is an orthogonal concept from how to create the vgic, and you could advertise this capability without also supporting the GICv2 device type. However, I don't believe such kernels exist the capability was advertised for arm/arm64 with GICv2 7330672 KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC (1 year, 8 months ago) Christoffer Dall so effectively I don't think we have any arm kernel advertising the CAP without GICv3. and they cannot in the future as that would be because we would remove an actively supported API. My preference here would be for kvm_arch_irqchip_create() to just use 'is the KVM_CAP_DEVICE_CTRL capability set' for its can we use the new API test; that will then work whether we have a GICv2 or GICv3 in the host. (The actual GIC device creation later on might fail, of course, but that can be handled at that point. The only thing we need to do as early as kvm_arch_irqchip_create is determine whether we must use the old API.) another way was proposed in the past was consisting in calling first ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V3, true); if this succeeds this means we have the new API (the only one used vy v3) and hence we have it as well for VGIC_V2, we can return ... if this fails we just can say VGICv3 KVM device hasn't registered, try KVM_DEV_TYPE_ARM_VGIC_V2 if we have it return else fall back to older API I think it worked as well? Besides I think Peter's suggestion is simpler. For info we also use KVM VFIO device now. Note that there's a difference between I called the create-device ioctl, and the creation of the device failed vs. I called the ioctl and I got an error because the ioctl is not supported, only in the latter case should you fall back to the older API. Not sure if the end result is the same with the suggested approach, based on the right error values etc. -Christoffer
Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM
On Wed, Aug 12, 2015 at 1:44 PM, Pavel Fedin p.fe...@samsung.com wrote: Hello! I still think this is the wrong approach -- see my remarks in the previous round of patch review. Christoffer did not reply anything to your question back then. So - what to do? Probe for all possible GICs? Remove the probe at all? I may have missed something here, what was the question? -Christoffer
Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM
On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 12 August 2015 at 13:27, Pavel Fedin p.fe...@samsung.com wrote: Hello! I still think this is the wrong approach -- see my remarks in the previous round of patch review. You know... I thought a little bit... So far, test = true in KVM_CREATE_DEVICE means that we just want to know whether this type is supported. No actual actions is done by the kernel. Is it correct? If yes, we can just leave this test as it is, because if it says that GICv2 is supported by KVM_CREATE_DEVICE, then: 1. We use new API. No KVM_IRQCHIP_CREATE. 2. GICv3 may be supported. Therefore, if we do this check, and it succeeds, then we just proceed, and later actually try to create GICv3. If it fails for some reason, we will see error message anyway. So would it be OK just not to touch kvm_arch_irqchip_create() at all? No, because then if the kernel only supports GICv3 the code in kvm_arch_irqchip_create() (as it is currently written) will erroneously fall back to using the old API. Christoffer: the question was, why does kvm_arch_irqchip_create() not only check the KVM_CAP_DEVICE_CTRL but also try to see if it can KVM_CREATE_DEVICE the GICv2 in order to avoid falling back to the old pre-KVM_CREATE_DEVICE API ? Are there kernels which have the capability bit set but which can't actually use KVM_CREATE_DEVICE to create the irqchip? My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is an orthogonal concept from how to create the vgic, and you could advertise this capability without also supporting the GICv2 device type. However, I don't believe such kernels exist and they cannot in the future as that would be because we would remove an actively supported API. My preference here would be for kvm_arch_irqchip_create() to just use 'is the KVM_CAP_DEVICE_CTRL capability set' for its can we use the new API test; that will then work whether we have a GICv2 or GICv3 in the host. (The actual GIC device creation later on might fail, of course, but that can be handled at that point. The only thing we need to do as early as kvm_arch_irqchip_create is determine whether we must use the old API.) I'm fine with this. -Christoffer
Re: [Qemu-devel] [PATCH v2] target-arm: kvm: Differentiate registers based on write-back levels
On Fri, Jul 17, 2015 at 03:29:56PM +0100, Peter Maydell wrote: On 16 July 2015 at 12:34, Christoffer Dall christoffer.d...@linaro.org wrote: Some registers like the CNTVCT register should only be written to the kernel as part of machine initialization or on vmload operations, but never during runtime, as this can potentially make time go backwards or create inconsistent time observations between VCPUs. Introduce a list of registers that should not be written back at runtime and check this list on syncing the register state to the KVM state. Thanks. I think this should go into QEMU 2.4, given that it fixes a bug with time misbehaving in guests. Are you happy that it's received enough testing? (I have given 32-bit KVM a spin but have no convenient 64-bit box to test with, and besides, I didn't notice the bug in the first place :-)) I tested this on both Juno and Mustang, with a simple loop kernel booting and doing hackbench test, but if we want to be on the extra careful side, perhaps Alex can run it through his migration test setup? I don't think that's necessary though. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since RFC: - Move cpreg_level to kvm_arm_cpreg_level and into kvm32.c and kvm64.c - Changed struct name and declare as static const I have a couple of minor comments on the comments below, and you forgot to update the stub version of write_list_to_kvmstate(). I can just fix these up as I put it into target-arm.next, though. Fixed up version at: https://git.linaro.org/people/peter.maydell/qemu-arm.git target-arm.next If interested parties could test that by end-of-Monday that would be nice (since rc2 is scheduled for Tuesday). dtc | 2 +- target-arm/kvm.c | 6 +- target-arm/kvm32.c | 30 +- target-arm/kvm64.c | 30 +- target-arm/kvm_arm.h | 12 +++- target-arm/machine.c | 2 +- 6 files changed, 76 insertions(+), 6 deletions(-) diff --git a/dtc b/dtc index 65cc4d2..bc895d6 16 --- a/dtc +++ b/dtc @@ -1 +1 @@ -Subproject commit 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf +Subproject commit bc895d6d09695d05ceb8b52486ffe861d6cfbdde Stray submodule change :-) Damn, keeps happening to me. ok, I'll stop using git commit -a for qemu changes. diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c index d7e7d68..6769815 100644 --- a/target-arm/kvm32.c +++ b/target-arm/kvm32.c @@ -153,6 +153,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx) } } +typedef struct CPRegStateLevel { +uint64_t regidx; +int level; +} CPRegStateLevel; + +/* All coprocessor registers not listed in the following table are assumed to + * be of the level KVM_PUT_RUNTIME_STATE, a register should be written less . If a register. + * often, you must add it to this table with a state of either + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE. + */ +static const CPRegStateLevel non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c index ac34f51..d59f41c 100644 --- a/target-arm/kvm64.c +++ b/target-arm/kvm64.c @@ -139,6 +139,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx) } } +typedef struct CPRegStateLevel { +uint64_t regidx; +int level; +} CPRegStateLevel; + +/* All system not listed in the following table are assumed to be of the level system registers + * KVM_PUT_RUNTIME_STATE, a register should be written less often, you must . If a register + * add it to this table with a state of either KVM_PUT_RESET_STATE or + * KVM_PUT_FULL_STATE. + */ +static const CPRegStateLevel non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; --- a/target-arm/kvm_arm.h +++ b/target-arm/kvm_arm.h @@ -83,7 +93,7 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx); * Note that we do not stop early on failure -- we will attempt * writing all registers in the list. */ -bool write_list_to_kvmstate(ARMCPU *cpu); +bool write_list_to_kvmstate(ARMCPU *cpu, int level); You forgot to update the stub function in target-arm/kvm-stub.c, so this breaks compilation on non-ARM hosts. whoops, who cares about non-ARM hosts anyway. Thanks for fixing these up, the fixed up version looks good. -Christoffer
[Qemu-devel] [PATCH v2] target-arm: kvm: Differentiate registers based on write-back levels
Some registers like the CNTVCT register should only be written to the kernel as part of machine initialization or on vmload operations, but never during runtime, as this can potentially make time go backwards or create inconsistent time observations between VCPUs. Introduce a list of registers that should not be written back at runtime and check this list on syncing the register state to the KVM state. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since RFC: - Move cpreg_level to kvm_arm_cpreg_level and into kvm32.c and kvm64.c - Changed struct name and declare as static const dtc | 2 +- target-arm/kvm.c | 6 +- target-arm/kvm32.c | 30 +- target-arm/kvm64.c | 30 +- target-arm/kvm_arm.h | 12 +++- target-arm/machine.c | 2 +- 6 files changed, 76 insertions(+), 6 deletions(-) diff --git a/dtc b/dtc index 65cc4d2..bc895d6 16 --- a/dtc +++ b/dtc @@ -1 +1 @@ -Subproject commit 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf +Subproject commit bc895d6d09695d05ceb8b52486ffe861d6cfbdde diff --git a/target-arm/kvm.c b/target-arm/kvm.c index 548bfd7..b278542 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -409,7 +409,7 @@ bool write_kvmstate_to_list(ARMCPU *cpu) return ok; } -bool write_list_to_kvmstate(ARMCPU *cpu) +bool write_list_to_kvmstate(ARMCPU *cpu, int level) { CPUState *cs = CPU(cpu); int i; @@ -421,6 +421,10 @@ bool write_list_to_kvmstate(ARMCPU *cpu) uint32_t v32; int ret; +if (kvm_arm_cpreg_level(regidx) level) { +continue; +} + r.id = regidx; switch (regidx KVM_REG_SIZE_MASK) { case KVM_REG_SIZE_U32: diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c index d7e7d68..6769815 100644 --- a/target-arm/kvm32.c +++ b/target-arm/kvm32.c @@ -153,6 +153,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx) } } +typedef struct CPRegStateLevel { +uint64_t regidx; +int level; +} CPRegStateLevel; + +/* All coprocessor registers not listed in the following table are assumed to + * be of the level KVM_PUT_RUNTIME_STATE, a register should be written less + * often, you must add it to this table with a state of either + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE. + */ +static const CPRegStateLevel non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; + +int kvm_arm_cpreg_level(uint64_t regidx) +{ +int i; + +for (i = 0; i ARRAY_SIZE(non_runtime_cpregs); i++) { +const CPRegStateLevel *l = non_runtime_cpregs[i]; +if (l-regidx == regidx) { +return l-level; +} +} + +return KVM_PUT_RUNTIME_STATE; +} + #define ARM_MPIDR_HWID_BITMASK 0xFF #define ARM_CPU_ID_MPIDR 0, 0, 0, 5 @@ -367,7 +395,7 @@ int kvm_arch_put_registers(CPUState *cs, int level) * managed to update the CPUARMState with, and only allowing those * to be written back up into the kernel). */ -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, level)) { return EINVAL; } diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c index ac34f51..d59f41c 100644 --- a/target-arm/kvm64.c +++ b/target-arm/kvm64.c @@ -139,6 +139,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx) } } +typedef struct CPRegStateLevel { +uint64_t regidx; +int level; +} CPRegStateLevel; + +/* All system not listed in the following table are assumed to be of the level + * KVM_PUT_RUNTIME_STATE, a register should be written less often, you must + * add it to this table with a state of either KVM_PUT_RESET_STATE or + * KVM_PUT_FULL_STATE. + */ +static const CPRegStateLevel non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; + +int kvm_arm_cpreg_level(uint64_t regidx) +{ +int i; + +for (i = 0; i ARRAY_SIZE(non_runtime_cpregs); i++) { +const CPRegStateLevel *l = non_runtime_cpregs[i]; +if (l-regidx == regidx) { +return l-level; +} +} + +return KVM_PUT_RUNTIME_STATE; +} + #define AARCH64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) @@ -280,7 +308,7 @@ int kvm_arch_put_registers(CPUState *cs, int level) return ret; } -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, level)) { return EINVAL; } diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h index 5abd591..7912d74 100644 --- a/target-arm/kvm_arm.h +++ b/target-arm/kvm_arm.h @@ -69,8 +69,18 @@ int kvm_arm_init_cpreg_list(ARMCPU *cpu); bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx); /** + * kvm_arm_cpreg_level + * regidx: KVM register index + * + * Return the level of this coprocessor/system register. Return value is + * either KVM_PUT_RUNTIME_STATE, KVM_PUT_RESET_STATE, or KVM_PUT_FULL_STATE
Re: [Qemu-devel] [RFC PATCH] target-arm: kvm: Differentiate registers based on write-back levels
On Fri, Jul 10, 2015 at 12:22:31PM +0100, Peter Maydell wrote: On 10 July 2015 at 12:00, Christoffer Dall christoffer.d...@linaro.org wrote: Some registers like the CNTVCT register should only be written to the kernel as part of machine initialization or on vmload operations, but never during runtime, as this can potentially make time go backwards or create inconsistent time observations between VCPUs. Introduce a list of registers that should not be written back at runtime and check this list on syncing the register state to the KVM state. Thanks for picking this one up... Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- target-arm/kvm.c | 34 +- target-arm/kvm32.c | 2 +- target-arm/kvm64.c | 2 +- target-arm/kvm_arm.h | 3 ++- target-arm/machine.c | 2 +- 5 files changed, 38 insertions(+), 5 deletions(-) diff --git a/target-arm/kvm.c b/target-arm/kvm.c index 548bfd7..2e92699 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -409,7 +409,35 @@ bool write_kvmstate_to_list(ARMCPU *cpu) return ok; } -bool write_list_to_kvmstate(ARMCPU *cpu) +typedef struct cpreg_state_level { +uint64_t kvm_idx; +int level; +} cpreg_state_level; (QEMU's coding style prefers CPRegStateLevel for struct types.) ok + +/* All system registers not listed in the following table are assumed to be + * of the level KVM_PUT_RUNTIME_STATE, a register should be written less + * often, you must add it to this table with a state of either + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE. + */ +cpreg_state_level non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, This should be KVM_PUT_RESET_STATE, right? should it? If you reset a real machine, you will not necessarily see a counter value of zero will you? I guess this depends on whether QEMU reset means power the system completely off and then on again, or some softer reset? +}; The other option here would be to keep the level information in the cpreg structs (which is where we put everything else we know about cpregs); we'd probably need to then initialise some other data structure if we wanted to avoid the hash table lookup for every register in write_list_to_kvmstate. I guess if we expect this list to remain a fairly small set of exceptional cases then this is OK (and vaguely comparable to the existing kvm_arm_reg_syncs_via-cpreg_list handling). I thought about this too, and sent this as an RFC for exactly this reason. I did it this way initially for two reasons: (1) I don't understand the hash-table register initialization flow for aarch64 and (2) I could really only identify this single register for now that needs to be marked as a non-runtime register, and then this is less invasive. Don't we need separate 32-bit and 64-bit versions of this list? Do we? I thought this file would compile separately for the 32-bit and 64-bit versions and the register index define is the same name for both architectures, did I get this wrong? Of course, for other registers with unique-to-32-bit-or-64-bit reg index defines, yes, we would need a separate table. Should they then be defined in the kvm32.c and kvm64.c and passed in as a pointer to write_kvmstate_to_list() ? Thanks, -Christoffer
[Qemu-devel] [RFC PATCH] target-arm: kvm: Differentiate registers based on write-back levels
Some registers like the CNTVCT register should only be written to the kernel as part of machine initialization or on vmload operations, but never during runtime, as this can potentially make time go backwards or create inconsistent time observations between VCPUs. Introduce a list of registers that should not be written back at runtime and check this list on syncing the register state to the KVM state. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- target-arm/kvm.c | 34 +- target-arm/kvm32.c | 2 +- target-arm/kvm64.c | 2 +- target-arm/kvm_arm.h | 3 ++- target-arm/machine.c | 2 +- 5 files changed, 38 insertions(+), 5 deletions(-) diff --git a/target-arm/kvm.c b/target-arm/kvm.c index 548bfd7..2e92699 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -409,7 +409,35 @@ bool write_kvmstate_to_list(ARMCPU *cpu) return ok; } -bool write_list_to_kvmstate(ARMCPU *cpu) +typedef struct cpreg_state_level { +uint64_t kvm_idx; +int level; +} cpreg_state_level; + +/* All system registers not listed in the following table are assumed to be + * of the level KVM_PUT_RUNTIME_STATE, a register should be written less + * often, you must add it to this table with a state of either + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE. + */ +cpreg_state_level non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; + +static int cpreg_level(uint64_t kvm_idx) +{ +int i; + +for (i = 0; i ARRAY_SIZE(non_runtime_cpregs); i++) { +cpreg_state_level *l = non_runtime_cpregs[i]; +if (l-kvm_idx == kvm_idx) { +return l-level; +} +} + +return KVM_PUT_RUNTIME_STATE; +} + +bool write_list_to_kvmstate(ARMCPU *cpu, int level) { CPUState *cs = CPU(cpu); int i; @@ -421,6 +449,10 @@ bool write_list_to_kvmstate(ARMCPU *cpu) uint32_t v32; int ret; +if (cpreg_level(regidx) level) { +continue; +} + r.id = regidx; switch (regidx KVM_REG_SIZE_MASK) { case KVM_REG_SIZE_U32: diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c index d7e7d68..9fbd5fd 100644 --- a/target-arm/kvm32.c +++ b/target-arm/kvm32.c @@ -367,7 +367,7 @@ int kvm_arch_put_registers(CPUState *cs, int level) * managed to update the CPUARMState with, and only allowing those * to be written back up into the kernel). */ -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, level)) { return EINVAL; } diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c index ac34f51..2911679 100644 --- a/target-arm/kvm64.c +++ b/target-arm/kvm64.c @@ -280,7 +280,7 @@ int kvm_arch_put_registers(CPUState *cs, int level) return ret; } -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, level)) { return EINVAL; } diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h index 5abd591..ce03e97 100644 --- a/target-arm/kvm_arm.h +++ b/target-arm/kvm_arm.h @@ -71,6 +71,7 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx); /** * write_list_to_kvmstate: * @cpu: ARMCPU + * @level: the state level to sync * * For each register listed in the ARMCPU cpreg_indexes list, write * its value from the cpreg_values list into the kernel (via ioctl). @@ -83,7 +84,7 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx); * Note that we do not stop early on failure -- we will attempt * writing all registers in the list. */ -bool write_list_to_kvmstate(ARMCPU *cpu); +bool write_list_to_kvmstate(ARMCPU *cpu, int level); /** * write_kvmstate_to_list: diff --git a/target-arm/machine.c b/target-arm/machine.c index 9eb51df..32adfe7 100644 --- a/target-arm/machine.c +++ b/target-arm/machine.c @@ -251,7 +251,7 @@ static int cpu_post_load(void *opaque, int version_id) } if (kvm_enabled()) { -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, KVM_PUT_FULL_STATE)) { return -1; } /* Note that it's OK for the TCG side not to know about -- 2.1.2.330.g565301e.dirty
Re: [Qemu-devel] [PATCH v4 3/4] target-arm: Extend the gic node properties
On Thu, Jul 9, 2015 at 6:28 PM, Peter Maydell peter.mayd...@linaro.org wrote: On 9 July 2015 at 17:03, Christoffer Dall christoffer.d...@linaro.org wrote: [whoops, re-adding qemu-devel] On Thu, Jul 9, 2015 at 5:07 PM, Alexander Graf ag...@suse.de wrote: On 07/09/15 16:44, Christoffer Dall wrote: I'll be honest and say that I don't fully understand the details of the interrupt-map specification, and I cannot seem to find it online either (the working link I had before gives me a 404 these days). Try http://www.firmware.org/1275/practice/imap/imap0_9d.pdf that's the one, thanks for the working url. It explains it conceptually, yes, but it's hardly a spec. At least I can't understand from that page why the entries in the map have to be changed based on size-cells and address-cells in the interrupt controller... The interrupt map entries are: * child unit interrupt specifier [4 cells for PCI, determined by #address-cells + #interrupt-cells for the PCI controller node] * interrupt parent phandle * parent unit interrupt specifier [number of cells determined by #address-cells + #interrupt-cells for the interrupt controller] So the extra two zeroes aren't part of the phandle, they're the result of the parent unit-interrupt-specifier now being 5 cells because of #address-cells being defined in the GIC node. Thanks for clearing this up! I owe you a cup of your favorite poison then:) -Christoffer
Re: [Qemu-devel] [PATCH v4 3/4] target-arm: Extend the gic node properties
[whoops, re-adding qemu-devel] On Thu, Jul 9, 2015 at 5:07 PM, Alexander Graf ag...@suse.de wrote: On 07/09/15 16:44, Christoffer Dall wrote: On Thu, Jul 09, 2015 at 04:02:14PM +0200, Claudio Fontana wrote: Hello Christoffer, just one question: In preparation for adding the GICv2m which requires address specifiers and is a subnode of the gic, we extend the gic DT definition to specify the #address-cells and #size-cells properties and add an empty ranges property properties of the DT node, since this is required to add the v2m node as a child of the gic node. Note that we must also expand the irq-map to reference the gic with the right address-cells as a consequence of this change. Reviewed-by: Eric Auger address@hidden Suggested-by: Shanker Donthineni address@hidden Signed-off-by: Christoffer Dall address@hidden --- Changes since v3: - Rewrote patch and changed authorship and tags accordingly - Fixed spelling in commit message Changes since v2: - New separate patch factoring out changes to existing code for eased bisectability in case we broke something - The above fixes the issue with non-MSI compatible guests. hw/arm/virt.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e5235ef..387dac8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -317,6 +317,9 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi) 2, vbi-memmap[VIRT_GIC_DIST].size, 2, vbi-memmap[VIRT_GIC_CPU].base, 2, vbi-memmap[VIRT_GIC_CPU].size); +qemu_fdt_setprop_cell(vbi-fdt, /intc, #address-cells, 0x2); +qemu_fdt_setprop_cell(vbi-fdt, /intc, #size-cells, 0x2); +qemu_fdt_setprop(vbi-fdt, /intc, ranges, NULL, 0); qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle); } @@ -585,7 +588,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, int first_irq, const char *nodename) { int devfn, pin; -uint32_t full_irq_map[4 * 4 * 8] = { 0 }; +uint32_t full_irq_map[4 * 4 * 10] = { 0 }; uint32_t *irq_map = full_irq_map; for (devfn = 0; devfn = 0x18; devfn += 0x8) { @@ -598,13 +601,13 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, uint32_t map[] = { devfn 8, 0, 0, /* devfn */ pin + 1,/* PCI pin */ -gic_phandle, irq_type, irq_nr, irq_level }; /* GIC irq */ +gic_phandle, 0, 0, irq_type, irq_nr, irq_level }; /* GIC irq */ can you help me understand how to decode this? We are adding two 32bit 0 values in the map. So now gic_phandle field occupies a total of 3 x 32bit values. Is this really what was intended? If the #address-cell and #size-cells of the intc are 2 and 2 respectively, shouldn't this be gic_phandle, 0, irq_type, irq_nr, irq_level ? Too much time since I read this code I guess.. I'll be honest and say that I don't fully understand the details of the interrupt-map specification, and I cannot seem to find it online either (the working link I had before gives me a 404 these days). So I followed the suggestion from Shanker and it worked with adding two zeroes, not with only adding one, so I figured it was the correct thing to do. Perhaps Alex Graf who wrote the original code to generate the interrupt map can help? The page works quite well for me: http://devicetree.org/Device_Tree_Usage#Advanced_Interrupt_Mapping It explains it conceptually, yes, but it's hardly a spec. At least I can't understand from that page why the entries in the map have to be changed based on size-cells and address-cells in the interrupt controller... I didn't think the bits we needed to add were related to the phandle; I always thought a phandle was just an internal to the DT 32-bit number to refer to a different node. When it comes to dt bits where I'm uncertain, I usually end up asking Rob though :). Rob, can you help us out here? Beers are on me at SFO15 ;) -Christoffer
Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500
On Fri, Jul 3, 2015 at 8:52 AM, Pavel Fedin p.fe...@samsung.com wrote: Hi! I think you should leave out the software emulation part for this series and let's concentrate on getting this in first. Ok. Also, you should make sure you have an agreement with Shlomo about taking over his patches before doing so (maybe you are already?). Shlomo has posted his work here, doesn't this automatically mean that he is sharing it with the community? Additionally, i do not drop his authorship. Well, it can become a little messy for everyone if Shlomo is working on refactoring and respinning his patches, and meanwhile new versions of his patches go out embedded as part of another patch set. It's not exactly harmful, but usually doesn't happen without prior agreement amongst the involved parties, I would say. -Christoffer
Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500
On Fri, Jul 3, 2015 at 10:47 AM, Pavel Fedin p.fe...@samsung.com wrote: Hi! Well, it can become a little messy for everyone if Shlomo is working on refactoring and respinning his patches, and meanwhile new versions of his patches go out embedded as part of another patch set Exactly for this reason i refactor his code as little as possible, so that he can merge things easily, if he is around. But looks like he become upset and gone, we haven't seen him around for several weeks. I don't think you should put words in other people's mouth. He may be on vacation or awaiting approval for submitting this stuff or something like that. That said, Shlomo, what is your status? Meanwhile i'll try to incorporate SW emulation too, because i also may need it. Just needs more time, and yes, ITS is a separate thing (by the way i even have the code, just without LPIs). So yes, i'll split it up a bit more. I *really* think we should try to *agree* on who does what here, so that we don't have more copies of the same work etc. on the list. -Christoffer
Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500
On Thu, Jul 2, 2015 at 4:47 PM, Pavel Fedin p.fe...@samsung.com wrote: Hello! I already explained this earlier: http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg04842.html, and i tried to explain this in commit message. Current qemu architecture does not allow doing this in a clean way. I can simply change mc-max_cpus for 'virt' machine to 64 (always), and then check user-supplied value against GIC limitation by myself. And produce error. This will be code duplication. Do you think it is better? Anybody else (Peter, Cristoffer ?), please vote. We definitely need a single virt machine, not multiple machines. I don't know enough about the QEMU internals as to how to accomplish this in detail. -Christoffer
Re: [Qemu-devel] [PATCH v4 4/9] Add virt-v3 machine that uses GIC-500
On Thu, Jul 2, 2015 at 5:30 PM, Pavel Fedin p.fe...@samsung.com wrote: We definitely need a single virt machine, not multiple machines. I don't know enough about the QEMU internals as to how to accomplish this in detail. Two beat one... Okay, i surrender. :) I will respin this, hopefully tomorrow. I will change to machine option, and i will likely leave out software GIC emulation. Everyone okay with this? Peter? I think you should leave out the software emulation part for this series and let's concentrate on getting this in first. Also, you should make sure you have an agreement with Shlomo about taking over his patches before doing so (maybe you are already?). Thanks, -Christoffer
Re: [Qemu-devel] [PATCH RFC 0/4] vGICv3 support
On Fri, May 22, 2015 at 01:58:40PM +0300, Pavel Fedin wrote: This is my alternative to Ashok's vGICv3 patch (https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg03021.html), which i am currently working on. It addresses vGIC capability verification issue (kvm_irqchip_create() / kvm_arch_irqchip_create()), as well as offers better code structure (v3 code separated from v2). This patchset applies on top of this: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00943.html. Note that GIC type selection still relies on machine name (virt-v3 vs virt), and not on machine option. Since libvirt has recently introduced support for extra options, i have absolutely nothing against Ashok's approach. I just did not change this yet because it would affect my testing environment. The aim of this RFC is to focus on vGICv3 implementation and related changes. And yes, i agree that v2 and v3 now have some copypasted code, and this is TBD. This cover letter is not really helpful as it only describes the history and circumstances of how this patch came to be. It would be helpful if the beginning of this cover letter focuses on what the patch series does and which design decisions have been taken to shape the patches the way they are. I don't understand the whole background thing about libvirt and I don't at all understand the thing about copy-pasted code...?? A generally good approach to writing a cover letter is to follow this skeleton: --- This series implements...what We accomplish this by...design decisions [Optional] Patches 1-3 do something preliminary, patches 4-8 do something else... [Optional] Note something special, possibly historical Changes since vX: --- I think it would be good if you could re-spin this series based on Eric's comments on the code, my comments on the patch style, and Peter's advise on using machine properties for GICv3. Do you have cycles to continue working on this? -Christoffer Pavel Fedin (4): Add virt-v3 machine that uses GIC-500 Set kernel_irqchip_type for other ARM boards which use GIC First bits of vGICv3 support: Initial implementation of vGICv3. hw/arm/exynos4_boards.c | 1 + hw/arm/realview.c | 1 + hw/arm/vexpress.c | 1 + hw/arm/virt.c | 148 - hw/intc/Makefile.objs | 1 + hw/intc/arm_gicv3_kvm.c | 283 include/hw/boards.h | 1 + include/sysemu/kvm.h| 3 +- kvm-all.c | 2 +- stubs/kvm.c | 2 +- target-arm/kvm.c| 8 +- 11 files changed, 419 insertions(+), 32 deletions(-) create mode 100644 hw/intc/arm_gicv3_kvm.c -- 1.9.5.msysgit.0
Re: [Qemu-devel] [PATCH RFC 0/4] vGICv3 support
On Wed, Jul 01, 2015 at 02:14:55PM +0300, Pavel Fedin wrote: Hello! I think it would be good if you could re-spin this series based on Eric's comments on the code, my comments on the patch style, and Peter's advise on using machine properties for GICv3. There was a discussion about this, and i tried to add a machine property instead. This gave bad results: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg04842.html Do you have cycles to continue working on this? Yes, i do. And i wish to work on this, since upstreaming this is a goal of my project. I know that descriptions were bad, and actually they were parts of ongoing discussions. Actually i am waiting for integration of Shlomo Pongratz' series: http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg01512.html. Posted as standalone, my series collide with Shlomo's series (it actually relies on some parts from there), so i prefer to respin on top of something clean. I thought the general sense here was that since emulating the full device is much more complicated than driving the KVM part, the integration with the virt board should go in via this series, and the emulation should build on top of that? Of course, if the full emulation series is almost ready to be merged that's a different story. It just felt like both patch series have stalled somehow, and I would like to see what we can do to get this stuff moving again. The first patch from the mentioned series is already in master; Peter told me that he doesn't want to add the complete GICv3 before qemu 2.4 is released: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg04171.html So waiting... Fair enough. -Christoffer
Re: [Qemu-devel] [PATCH RFC 1/4] Add virt-v3 machine that uses GIC-500
On Fri, May 22, 2015 at 01:58:41PM +0300, Pavel Fedin wrote: This patch introduces kernel_irqchip_type member in Machine class. Currently it it used only by virt machine for its internal purposes, however in future it is to be passed to KVM in kvm_irqchip_create(). The variable is defined as int in order to be architecture agnostic, for potential future users. Can you use a decent editor/mail agent and break your lines at 72 chars for commit messages please? This commit message is very code-specific and doesn't explain the overall change/purpose; for example I don't understand from reading this commit if the patch expects a new machine virt-v3 or using machine properties as discussed before. I think it's the former and I think you should re-spin these patches using a property, since Peter already clearly expressed how this should be done and it's no use reviewing something which we already know is not the right approach: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02204.html Thanks, -Christoffer Signed-off-by: Pavel Fedin p.fe...@samsung.com --- hw/arm/virt.c | 148 +++- include/hw/boards.h | 1 + 2 files changed, 123 insertions(+), 26 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index a1186c5..15724b2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -66,6 +66,10 @@ enum { VIRT_CPUPERIPHS, VIRT_GIC_DIST, VIRT_GIC_CPU, +VIRT_GIC_DIST_SPI = VIRT_GIC_CPU, +VIRT_ITS_CONTROL, +VIRT_ITS_TRANSLATION, +VIRT_LPI, VIRT_UART, VIRT_MMIO, VIRT_RTC, @@ -107,6 +111,8 @@ typedef struct { #define VIRT_MACHINE_CLASS(klass) \ OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE) +#define TYPE_VIRTV3_MACHINE virt-v3 + /* Addresses and sizes of our components. * 0..128MB is space for a flash device so we can run bootrom code such as UEFI. * 128MB..256MB is used for miscellaneous device I/O. @@ -121,25 +127,29 @@ typedef struct { */ static const MemMapEntry a15memmap[] = { /* Space up to 0x800 is reserved for a boot ROM */ -[VIRT_FLASH] = { 0, 0x0800 }, -[VIRT_CPUPERIPHS] = { 0x0800, 0x0002 }, +[VIRT_FLASH] = { 0, 0x0800 }, +[VIRT_CPUPERIPHS] = { 0x0800, 0x0002 }, /* GIC distributor and CPU interfaces sit inside the CPU peripheral space */ -[VIRT_GIC_DIST] = { 0x0800, 0x0001 }, -[VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, -[VIRT_UART] = { 0x0900, 0x1000 }, -[VIRT_RTC] ={ 0x0901, 0x1000 }, -[VIRT_FW_CFG] = { 0x0902, 0x000a }, -[VIRT_MMIO] = { 0x0a00, 0x0200 }, +[VIRT_GIC_DIST] ={ 0x0800, 0x0001 }, +[VIRT_GIC_CPU] = { 0x0801, 0x0001 }, +/* On v3 VIRT_GIC_DIST_SPI takes place of VIRT_GIC_CPU */ +[VIRT_ITS_CONTROL] = { 0x0802, 0x0001 }, +[VIRT_ITS_TRANSLATION] = { 0x0803, 0x0001 }, +[VIRT_LPI] = { 0x0804, 0x0080 }, +[VIRT_UART] ={ 0x0900, 0x1000 }, +[VIRT_RTC] = { 0x0901, 0x1000 }, +[VIRT_FW_CFG] = { 0x0902, 0x000a }, +[VIRT_MMIO] ={ 0x0a00, 0x0200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ /* * PCIE verbose map: * - * MMIO window { 0x1000, 0x2eff }, - * PIO window { 0x3eff, 0x0001 }, - * ECAM { 0x3f00, 0x0100 }, + * MMIO window { 0x1000, 0x2eff }, + * PIO window{ 0x3eff, 0x0001 }, + * ECAM { 0x3f00, 0x0100 }, */ -[VIRT_PCIE] = { 0x1000, 0x3000 }, -[VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 }, +[VIRT_PCIE] ={ 0x1000, 0x3000 }, +[VIRT_MEM] = { 0x4000, 30ULL * 1024 * 1024 * 1024 }, }; static const int a15irqmap[] = { @@ -273,9 +283,11 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi) */ ARMCPU *armcpu; uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI; +/* Argument is 32 bit but 8 bits are reserved for flags */ +uint32_t max = (vbi-smp_cpus = 24) ? 24 : vbi-smp_cpus; irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, - GIC_FDT_IRQ_PPI_CPU_WIDTH, (1 vbi-smp_cpus) - 1); + GIC_FDT_IRQ_PPI_CPU_WIDTH, (1 max) - 1); qemu_fdt_add_subnode(vbi-fdt, /timer); @@ -299,6 +311,18 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi) { int cpu; +/* + * From Documentation/devicetree/bindings/arm/cpus.txt + * On ARM v8 64-bit systems value should be set to 2, + * that corresponds to the
Re: [Qemu-devel] [PATCH RFC 2/4] Set kernel_irqchip_type for other ARM boards which use GIC
Missing commit text completely? On Fri, May 22, 2015 at 01:58:42PM +0300, Pavel Fedin wrote: Signed-off-by: Pavel Fedin p.fe...@samsung.com --- hw/arm/exynos4_boards.c | 1 + hw/arm/realview.c | 1 + hw/arm/vexpress.c | 1 + 3 files changed, 3 insertions(+) diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c index d644db1..d4136bc 100644 --- a/hw/arm/exynos4_boards.c +++ b/hw/arm/exynos4_boards.c @@ -104,6 +104,7 @@ static Exynos4210State *exynos4_boards_init_common(MachineState *machine, exynos4_machines[board_type].max_cpus); } +machine-kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2; exynos4_board_binfo.ram_size = exynos4_board_ram_size[board_type]; exynos4_board_binfo.board_id = exynos4_board_id[board_type]; exynos4_board_binfo.smp_bootreg_addr = diff --git a/hw/arm/realview.c b/hw/arm/realview.c index ef2788d..f670d9f 100644 --- a/hw/arm/realview.c +++ b/hw/arm/realview.c @@ -74,6 +74,7 @@ static void realview_init(MachineState *machine, ram_addr_t ram_size = machine-ram_size; hwaddr periphbase = 0; +machine-kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2; switch (board_type) { case BOARD_EB: break; diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c index 8f1a5ea..b0a29f1 100644 --- a/hw/arm/vexpress.c +++ b/hw/arm/vexpress.c @@ -556,6 +556,7 @@ static void vexpress_common_init(MachineState *machine) const hwaddr *map = daughterboard-motherboard_map; int i; +machine-kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2; daughterboard-init(vms, machine-ram_size, machine-cpu_model, pic); /* -- 1.9.5.msysgit.0
Re: [Qemu-devel] [PATCH RFC 3/4] First bits of vGICv3 support:
Hi Pavel, Please get rid of the trailing colon in the subject message. On Fri, May 22, 2015 at 01:58:43PM +0300, Pavel Fedin wrote: - Make use of kernel_irqchip_type in kvm_arch_irqchip_create() - Instantiate kvm-arm-gicv3 class (not implemented yet) for GICv3 with KVM acceleration I feel like these bullets should be written as paragraphs with a proper explanation of what we are doing here from a high-level point of view. Thanks, -Christoffer Signed-off-by: Pavel Fedin p.fe...@samsung.com --- hw/arm/virt.c| 6 ++ include/sysemu/kvm.h | 3 ++- kvm-all.c| 2 +- stubs/kvm.c | 2 +- target-arm/kvm.c | 8 ++-- 5 files changed, 12 insertions(+), 9 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 15724b2..1e42e59 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -400,11 +400,9 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic, int type) int i; if (type == KVM_DEV_TYPE_ARM_VGIC_V3) { -gictype = arm_gicv3; -} else if (kvm_irqchip_in_kernel()) { -gictype = kvm-arm-gic; +gictype = kvm_irqchip_in_kernel() ? kvm-arm-gicv3 : arm_gicv3; } else { -gictype = arm_gic; +gictype = kvm_irqchip_in_kernel() ? kvm-arm-gic : arm_gic; } gicdev = qdev_create(NULL, gictype); diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 4878959..5d90257 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -424,6 +424,7 @@ void kvm_init_irq_routing(KVMState *s); /** * kvm_arch_irqchip_create: * @KVMState: The KVMState pointer + * @type: irqchip type, architecture-specific * * Allow architectures to create an in-kernel irq chip themselves. * @@ -431,7 +432,7 @@ void kvm_init_irq_routing(KVMState *s); *0: irq chip was not created * 0: irq chip was created */ -int kvm_arch_irqchip_create(KVMState *s); +int kvm_arch_irqchip_create(KVMState *s, int type); /** * kvm_set_one_reg - set a register value in KVM via KVM_SET_ONE_REG ioctl diff --git a/kvm-all.c b/kvm-all.c index 17a3771..22e2621 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1393,7 +1393,7 @@ static int kvm_irqchip_create(MachineState *machine, KVMState *s) /* First probe and see if there's a arch-specific hook to create the * in-kernel irqchip for us */ -ret = kvm_arch_irqchip_create(s); +ret = kvm_arch_irqchip_create(s, machine-kernel_irqchip_type); if (ret 0) { return ret; } else if (ret == 0) { diff --git a/stubs/kvm.c b/stubs/kvm.c index e7c60b6..a8505ff 100644 --- a/stubs/kvm.c +++ b/stubs/kvm.c @@ -1,7 +1,7 @@ #include qemu-common.h #include sysemu/kvm.h -int kvm_arch_irqchip_create(KVMState *s) +int kvm_arch_irqchip_create(KVMState *s, int type) { return 0; } diff --git a/target-arm/kvm.c b/target-arm/kvm.c index 16abbf1..65794cf 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -579,7 +579,7 @@ void kvm_arch_init_irq_routing(KVMState *s) { } -int kvm_arch_irqchip_create(KVMState *s) +int kvm_arch_irqchip_create(KVMState *s, int type) { int ret; @@ -587,11 +587,15 @@ int kvm_arch_irqchip_create(KVMState *s) * let the device do this when it initializes itself, otherwise we * fall back to the old API */ -ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V2, true); +ret = kvm_create_device(s, type, true); if (ret == 0) { return 1; } +/* Fallback will create VGIC v2 */ +if (type != KVM_DEV_TYPE_ARM_VGIC_V2) { +return ret; +} return 0; } -- 1.9.5.msysgit.0
Re: [Qemu-devel] [PATCH RFC 4/4] Initial implementation of vGICv3.
Hi Pavel, On Fri, May 22, 2015 at 07:57:13PM +0300, Pavel Fedin wrote: Hi! Looks GICv3 common class currently miss this security_extn field + parent_fiq so it does not compile without changes. Or did I miss something? Just throw this if(...) away. It's my fault. Actually i have rebased Shlomo's patches on yesterday's master, and during this i added parent_fiq[] to GICv3, because without filling these in the thing stopped working due to recent GICv2 code changes. I needed to fix the problem quickly, so i copied initializing parent_fiq together with security_extn field from v2 code. But, this makes no sense because security_extn is never initialized, it is not even assigned property name. So just remove this small fragment, it's not needed now. I have fixed this in my working tree 2 hours ago. parent_fiq is declared as: --- cut --- qemu_irq parent_fiq[GICV3_NCPU]; --- cut --- I think based on this comment that I can safely suggest a bit of advise regarding sending patches to the QEMU and Linux lists: Be as careful as you can, properly test your code, review your own commit messages, run spell check on them, and try to look at what you send out through the eyes of someone who has never seen this code before. Review resources are really scarse in these projects and almost all maintainers are overloaded, so we all have to work together and make sure we don't flood the lists unwarranted. It is good to re-spin quickly so people don't loose context, but if you are fixing things up in a matter of hours and sending out new revisions, you are moving too fast. Turn-around time is days/weeks in the best cases, so better to take a few extra days to thoroughly test a patch series. The exception is to illustrate a general idea or design, in which case clearly marking the series as an RFC and noting what people should look at and what people should ignore is key. Hope this makes sense, -Christoffer
Re: [Qemu-devel] should KVM or userspace be the one which decides what MIPIDR/affinity values to assign to vcpus?
On Thu, Jun 25, 2015 at 10:06:20AM +0100, Peter Maydell wrote: On 25 June 2015 at 09:00, Christoffer Dall christoffer.d...@linaro.org wrote: Of course, KVM can deny an unsupported configuration, but I am wondering if we really think anybody will care about the 'model such specific hardware' aspect with KVM, or if we should only consider the 'I want a VM with x VCPUs' scenario, in which case the second option below seems simpler to me. I agree it's not very likely anybody cares about the specific cluster topology. However if we don't want to support arbitrary topologies then QEMU is going to end up in the business of editing the user supplied device tree blob to make its cpu definitions match up with whatever the kernel provides, which could be pretty tedious. I see, then you can't easily contruct a machine and a DT in one go before talking to KVM. Oh well, I don't feel strongly one way or the other. -Christoffer
Re: [Qemu-devel] should KVM or userspace be the one which decides what MIPIDR/affinity values to assign to vcpus?
Hi, [sorry for reviving this thread late] On Tue, Jun 09, 2015 at 12:24:13PM +0100, Peter Maydell wrote: On 9 June 2015 at 11:52, Marc Zyngier marc.zyng...@arm.com wrote: On 08/06/15 11:52, Peter Maydell wrote: On 8 June 2015 at 11:32, Igor Mammedov imamm...@redhat.com wrote: On Thu, 4 Jun 2015 18:17:39 +0100 Peter Maydell peter.mayd...@linaro.org wrote: On 4 June 2015 at 17:40, Shlomo Pongratz shlomopongr...@gmail.com wrote: In order for it to work correctly we must use MPIDR values in the device tree which match the MPIDR values the kernel has picked for the vcpus, so ask KVM what those values are. Could we set QEMU's generated mpidr in kernel instead of pulling it from kernel, like we do with APIC ID in x86 and fix kernel not to reset it its own value (i.e. untie mpidr from vcpuid)? Then later we could move setting mpidr from kvm_arch_init_vcpu() into board code which should be setting it, since it knows/defines what topology it has. This is a question better asked on the kvmarm list (which I have cc'd), because that is where the kernel folks hang out... Care to provide some context? Why is this required? The device tree spec says we have to tell the guest kernel the affinity information for each CPU, in the reg field of the CPU node. This device tree blob is created by userspace; the MPIDR the guest actually reads is currently created by the kernel. The kernel's vgic v3 code may also make assumptions about vcpu_id = affinity mappings, I'm not sure. In current code: * the kernel has an opinion about vcpu_id = MPIDR mappings * QEMU does too, but it's not the same! (luckily for 8 or fewer CPUs they agree, but this will not do for GICv3 support) So either: * QEMU needs to tell the kernel the MPIDR for each vCPU Does this option work for all configurations? What if you're running on a hardware with two clusters [8, 8] and QEMU asks KVM for a [12, 2] configuration, for example. Will it not be impossible for KVM to maintain correct semantics for coherency guarantees across what seems to the guest to be the same cluster, but really is two different clusters? Of course, KVM can deny an unsupported configuration, but I am wondering if we really think anybody will care about the 'model such specific hardware' aspect with KVM, or if we should only consider the 'I want a VM with x VCPUs' scenario, in which case the second option below seems simpler to me. Are there any concerns to think about wrt. migration here as well? Thanks, -Christoffer * QEMU needs to ask the kernel the MPIDR for each vCPU Which is better? The latter is simpler and will work with existing kernels. The former would let us (for instance) use KVM when we're modelling (real world) boards which have particular cluster configurations (which might not match the kernel's current simplistic always 16 CPUs at Aff0 setup). thanks -- PMM ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[Qemu-devel] [PATCH v4 0/4] Add support for for GICv2m and MSIs to arm-virt
Now when we have a host generic PCIe controller in the virt board, it would be nice to be able to use MSIs so that we can eventually enable VHOST with KVM. With these patches you can use MSIs with TCG and with KVM, but you still need some fixes for the mapping of the IRQ index to the GSI number for IRQFD to work. A separate series that enables IRQFD and vhost is available: ARM adaptations for vhost irqfd setup https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01054.html) Tested with KVM on XGene and with TCG by configuring a virtio-pci network adapter for the guest and verifying MSIs going through as expected. Rebased on target-arm.next, see the individual patches for detailed changelogs. Christoffer Dall (4): target-arm: Add GIC phandle to VirtBoardInfo arm_gicv2m: Add GICv2m widget to support MSIs target-arm: Extend the gic node properties target-arm: Add the GICv2m to the virt board hw/arm/virt.c | 73 ++- hw/intc/Makefile.objs | 1 + hw/intc/arm_gicv2m.c | 190 ++ include/hw/arm/virt.h | 2 + 4 files changed, 248 insertions(+), 18 deletions(-) create mode 100644 hw/intc/arm_gicv2m.c -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v4 3/4] target-arm: Extend the gic node properties
In preparation for adding the GICv2m which requires address specifiers and is a subnode of the gic, we extend the gic DT definition to specify the #address-cells and #size-cells properties and add an empty ranges property properties of the DT node, since this is required to add the v2m node as a child of the gic node. Note that we must also expand the irq-map to reference the gic with the right address-cells as a consequence of this change. Reviewed-by: Eric Auger eric.au...@linaro.org Suggested-by: Shanker Donthineni shank...@codeaurora.org Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v3: - Rewrote patch and changed authorship and tags accordingly - Fixed spelling in commit message Changes since v2: - New separate patch factoring out changes to existing code for eased bisectability in case we broke something - The above fixes the issue with non-MSI compatible guests. hw/arm/virt.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e5235ef..387dac8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -317,6 +317,9 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi) 2, vbi-memmap[VIRT_GIC_DIST].size, 2, vbi-memmap[VIRT_GIC_CPU].base, 2, vbi-memmap[VIRT_GIC_CPU].size); +qemu_fdt_setprop_cell(vbi-fdt, /intc, #address-cells, 0x2); +qemu_fdt_setprop_cell(vbi-fdt, /intc, #size-cells, 0x2); +qemu_fdt_setprop(vbi-fdt, /intc, ranges, NULL, 0); qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle); } @@ -585,7 +588,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, int first_irq, const char *nodename) { int devfn, pin; -uint32_t full_irq_map[4 * 4 * 8] = { 0 }; +uint32_t full_irq_map[4 * 4 * 10] = { 0 }; uint32_t *irq_map = full_irq_map; for (devfn = 0; devfn = 0x18; devfn += 0x8) { @@ -598,13 +601,13 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, uint32_t map[] = { devfn 8, 0, 0, /* devfn */ pin + 1,/* PCI pin */ -gic_phandle, irq_type, irq_nr, irq_level }; /* GIC irq */ +gic_phandle, 0, 0, irq_type, irq_nr, irq_level }; /* GIC irq */ /* Convert map to big endian */ -for (i = 0; i 8; i++) { +for (i = 0; i 10; i++) { irq_map[i] = cpu_to_be32(map[i]); } -irq_map += 8; +irq_map += 10; } } -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v4 1/4] target-arm: Add GIC phandle to VirtBoardInfo
Instead of passing the GIC phandle around between functions, add it to the VirtBoardInfo just like we do for the clock_phandle. We are about to add the v2m phandle as well, and it's easier not having to pass around a bunch of phandles, return multiple values from functions, etc. Reviewed-by: Eric Auger eric.au...@linaro.org Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v3: - Added reviewed-by tag Changes since v2: - None Changes since v1: - Added reviewed-by tag hw/arm/virt.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 05db8cb..e5235ef 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -69,6 +69,7 @@ typedef struct VirtBoardInfo { void *fdt; int fdt_size; uint32_t clock_phandle; +uint32_t gic_phandle; } VirtBoardInfo; typedef struct { @@ -299,12 +300,11 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi) } } -static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi) +static void fdt_add_gic_node(VirtBoardInfo *vbi) { -uint32_t gic_phandle; -gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt); -qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, gic_phandle); +vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt); +qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle); qemu_fdt_add_subnode(vbi-fdt, /intc); /* 'cortex-a15-gic' means 'GIC v2' */ @@ -317,12 +317,10 @@ static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi) 2, vbi-memmap[VIRT_GIC_DIST].size, 2, vbi-memmap[VIRT_GIC_CPU].base, 2, vbi-memmap[VIRT_GIC_CPU].size); -qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, gic_phandle); - -return gic_phandle; +qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle); } -static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) +static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) { /* We create a standalone GIC v2 */ DeviceState *gicdev; @@ -371,7 +369,7 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) pic[i] = qdev_get_gpio_in(gicdev, i); } -return fdt_add_gic_node(vbi); +fdt_add_gic_node(vbi); } static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic) @@ -618,8 +616,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, 0x7 /* PCI irq */); } -static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic, -uint32_t gic_phandle) +static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic) { hwaddr base_mmio = vbi-memmap[VIRT_PCIE_MMIO].base; hwaddr size_mmio = vbi-memmap[VIRT_PCIE_MMIO].size; @@ -685,7 +682,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic, 2, base_mmio, 2, size_mmio); qemu_fdt_setprop_cell(vbi-fdt, nodename, #interrupt-cells, 1); -create_pcie_irq_map(vbi, gic_phandle, irq, nodename); +create_pcie_irq_map(vbi, vbi-gic_phandle, irq, nodename); g_free(nodename); } @@ -717,7 +714,6 @@ static void machvirt_init(MachineState *machine) VirtBoardInfo *vbi; VirtGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); VirtGuestInfo *guest_info = guest_info_state-info; -uint32_t gic_phandle; char **cpustr; if (!cpu_model) { @@ -794,13 +790,13 @@ static void machvirt_init(MachineState *machine) create_flash(vbi); -gic_phandle = create_gic(vbi, pic); +create_gic(vbi, pic); create_uart(vbi, pic); create_rtc(vbi, pic); -create_pcie(vbi, pic, gic_phandle); +create_pcie(vbi, pic); /* Create mmio transports, so the user can create virtio backends * (which will be automatically plugged in to the transports). If -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v4 4/4] target-arm: Add the GICv2m to the virt board
Add a GICv2m device to the virt board to enable MSIs on the generic PCI host controller. We allocate 64 SPIs in the IRQ space for now (this can be increased/decreased later) and map the GICv2m right after the GIC in the memory map. Reviewed-by: Eric Auger eric.au...@linaro.org Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v3: - Rebased on target-arm.next, so moved some definitions to virt.h - Added reviewed-by tag Changes since v2: - Factored out changes to GIC DT node to previous patch. - Renamed QOM type name to arm-gicv2m Changes since v1: - Remove stray merge conflict line - Reworded commmit message. hw/arm/virt.c | 40 +++- include/hw/arm/virt.h | 2 ++ 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 387dac8..4bb7175 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -70,6 +70,7 @@ typedef struct VirtBoardInfo { int fdt_size; uint32_t clock_phandle; uint32_t gic_phandle; +uint32_t v2m_phandle; } VirtBoardInfo; typedef struct { @@ -109,6 +110,7 @@ static const MemMapEntry a15memmap[] = { /* GIC distributor and CPU interfaces sit inside the CPU peripheral space */ [VIRT_GIC_DIST] = { 0x0800, 0x0001 }, [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, +[VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, [VIRT_UART] = { 0x0900, 0x1000 }, [VIRT_RTC] ={ 0x0901, 0x1000 }, [VIRT_FW_CFG] = { 0x0902, 0x000a }, @@ -125,6 +127,7 @@ static const int a15irqmap[] = { [VIRT_RTC] = 2, [VIRT_PCIE] = 3, /* ... to 6 */ [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */ +[VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */ }; static VirtBoardInfo machines[] = { @@ -300,9 +303,21 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi) } } -static void fdt_add_gic_node(VirtBoardInfo *vbi) +static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi) { +vbi-v2m_phandle = qemu_fdt_alloc_phandle(vbi-fdt); +qemu_fdt_add_subnode(vbi-fdt, /intc/v2m); +qemu_fdt_setprop_string(vbi-fdt, /intc/v2m, compatible, +arm,gic-v2m-frame); +qemu_fdt_setprop(vbi-fdt, /intc/v2m, msi-controller, NULL, 0); +qemu_fdt_setprop_sized_cells(vbi-fdt, /intc/v2m, reg, + 2, vbi-memmap[VIRT_GIC_V2M].base, + 2, vbi-memmap[VIRT_GIC_V2M].size); +qemu_fdt_setprop_cell(vbi-fdt, /intc/v2m, phandle, vbi-v2m_phandle); +} +static void fdt_add_gic_node(VirtBoardInfo *vbi) +{ vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt); qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle); @@ -323,6 +338,25 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi) qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle); } +static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic) +{ +int i; +int irq = vbi-irqmap[VIRT_GIC_V2M]; +DeviceState *dev; + +dev = qdev_create(NULL, arm-gicv2m); +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vbi-memmap[VIRT_GIC_V2M].base); +qdev_prop_set_uint32(dev, base-spi, irq); +qdev_prop_set_uint32(dev, num-spi, NUM_GICV2M_SPIS); +qdev_init_nofail(dev); + +for (i = 0; i NUM_GICV2M_SPIS; i++) { +sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]); +} + +fdt_add_v2m_gic_node(vbi); +} + static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) { /* We create a standalone GIC v2 */ @@ -373,6 +407,8 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) } fdt_add_gic_node(vbi); + +create_v2m(vbi, pic); } static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic) @@ -676,6 +712,8 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic) qemu_fdt_setprop_cells(vbi-fdt, nodename, bus-range, 0, nr_pcie_buses - 1); +qemu_fdt_setprop_cells(vbi-fdt, nodename, msi-parent, vbi-v2m_phandle); + qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, reg, 2, base_ecam, 2, size_ecam); qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, ranges, diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index ceec8b3..003ef29 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -32,6 +32,7 @@ #include qemu-common.h +#define NUM_GICV2M_SPIS 64 #define NUM_VIRTIO_TRANSPORTS 32 #define ARCH_TIMER_VIRT_IRQ 11 @@ -53,6 +54,7 @@ enum { VIRT_PCIE_MMIO, VIRT_PCIE_PIO, VIRT_PCIE_ECAM, +VIRT_GIC_V2M, }; typedef struct MemMapEntry { -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v4 2/4] arm_gicv2m: Add GICv2m widget to support MSIs
The ARM GICv2m widget is a little device that handles MSI interrupt writes to a trigger register and ties them to a range of interrupt lines wires to the GIC. It has a few status/id registers and the interrupt wires, and that's about it. A board instantiates the device by setting the base SPI number and number SPIs for the frame. The base-spi parameter is indexed in the SPI number space only, so base-spi == 0, means IRQ number 32. When a device (the PCI host controller) writes to the trigger register, the payload is the GIC IRQ number, so we have to subtract 32 from that and then index into our frame of SPIs. When instantiating a GICv2m device, tell PCI that we have instantiated something that can deal with MSIs. We rely on the board actually wiring up the GICv2m to the PCI host controller. Reviewed-by: Eric Auger eric.au...@linaro.org Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v3: - Added reviewed-by tag Changes since v2: - Renamed QOM type to arm-gicv2m Changes since v1: - Check that writes to MSI_SETSPI are within the lower boundary as well - Move gicv2m to common-obj in Makefile - Separate switch case and comment for impdef regs - Clearly document what is emulated - Allow 16 bit lower accesses to MSI_SETSPI regs - Fix commit grammar error - Remove stray pixman commit hw/intc/Makefile.objs | 1 + hw/intc/arm_gicv2m.c | 190 ++ 2 files changed, 191 insertions(+) create mode 100644 hw/intc/arm_gicv2m.c diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs index 843864a..092d8a8 100644 --- a/hw/intc/Makefile.objs +++ b/hw/intc/Makefile.objs @@ -11,6 +11,7 @@ common-obj-$(CONFIG_SLAVIO) += slavio_intctl.o common-obj-$(CONFIG_IOAPIC) += ioapic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic.o +common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o common-obj-$(CONFIG_OPENPIC) += openpic.o obj-$(CONFIG_APIC) += apic.o apic_common.o diff --git a/hw/intc/arm_gicv2m.c b/hw/intc/arm_gicv2m.c new file mode 100644 index 000..9f84f72 --- /dev/null +++ b/hw/intc/arm_gicv2m.c @@ -0,0 +1,190 @@ +/* + * GICv2m extension for MSI/MSI-x support with a GICv2-based system + * + * Copyright (C) 2015 Linaro, All rights reserved. + * + * Author: Christoffer Dall christoffer.d...@linaro.org + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + */ + +/* This file implements an emulated GICv2m widget as described in the ARM + * Server Base System Architecture (SBSA) specification Version 2.2 + * (ARM-DEN-0029 v2.2) pages 35-39 without any optional implementation defined + * identification registers and with a single non-secure MSI register frame. + */ + +#include hw/sysbus.h +#include hw/pci/msi.h + +#define TYPE_ARM_GICV2M arm-gicv2m +#define ARM_GICV2M(obj) OBJECT_CHECK(ARMGICv2mState, (obj), TYPE_ARM_GICV2M) + +#define GICV2M_NUM_SPI_MAX 128 + +#define V2M_MSI_TYPER 0x008 +#define V2M_MSI_SETSPI_NS 0x040 +#define V2M_MSI_IIDR0xFCC +#define V2M_IIDR0 0xFD0 +#define V2M_IIDR11 0xFFC + +#define PRODUCT_ID_QEMU 0x51 /* ASCII code Q */ + +typedef struct ARMGICv2mState { +SysBusDevice parent_obj; + +MemoryRegion iomem; +qemu_irq spi[GICV2M_NUM_SPI_MAX]; + +uint32_t base_spi; +uint32_t num_spi; +} ARMGICv2mState; + +static void gicv2m_set_irq(void *opaque, int irq) +{ +ARMGICv2mState *s = (ARMGICv2mState *)opaque; + +qemu_irq_pulse(s-spi[irq]); +} + +static uint64_t gicv2m_read(void *opaque, hwaddr offset, +unsigned size) +{ +ARMGICv2mState *s = (ARMGICv2mState *)opaque; +uint32_t val; + +if (size != 4) { +qemu_log_mask(LOG_GUEST_ERROR, gicv2m_read: bad size %u\n, size); +return 0; +} + +switch (offset) { +case V2M_MSI_TYPER: +val = (s-base_spi + 32) 16; +val |= s-num_spi; +return val; +case V2M_MSI_IIDR: +/* We don't have any valid implementor so we leave that field as zero + * and we return 0 in the arch revision as per the spec. + */ +return (PRODUCT_ID_QEMU 20); +case V2M_IIDR0 ... V2M_IIDR11: +/* We do not implement any optional identification registers and the + * mandatory MSI_PIDR2 register reads as 0x0, so we
Re: [Qemu-devel] [PATCH v3 4/4] target-arm: Add the GICv2m to the virt board
Hi Pavel, On Mon, May 25, 2015 at 04:09:58PM +0300, Pavel Fedin wrote: Hello! typedef struct MemMapEntry { @@ -88,6 +90,7 @@ typedef struct VirtBoardInfo { int fdt_size; uint32_t clock_phandle; uint32_t gic_phandle; +uint32_t v2m_phandle; } VirtBoardInfo; Could you rename v2m_phandle to something more neutral like msi_phandle ? It will also be used by GICv3 ITS implementation. That's sort of how to speculate about. Why can't those patches just rename the variable then? Right now, as the code stands, msi_phandle would be less clear IMHO. -Christoffer
[Qemu-devel] [PATCH v3 1/4] target-arm: Add GIC phandle to VirtBoardInfo
Instead of passing the GIC phandle around between functions, add it to the VirtBoardInfo just like we do for the clock_phandle. We are about to add the v2m phandle as well, and it's easier not having to pass around a bunch of phandles, return multiple values from functions, etc. Reviewed-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v2: - None Changes since v1: - Added reviewed-by tag hw/arm/virt.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index a7f9a10..f9f7482 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -87,6 +87,7 @@ typedef struct VirtBoardInfo { void *fdt; int fdt_size; uint32_t clock_phandle; +uint32_t gic_phandle; } VirtBoardInfo; typedef struct { @@ -322,12 +323,11 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi) } } -static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi) +static void fdt_add_gic_node(VirtBoardInfo *vbi) { -uint32_t gic_phandle; -gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt); -qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, gic_phandle); +vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt); +qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle); qemu_fdt_add_subnode(vbi-fdt, /intc); /* 'cortex-a15-gic' means 'GIC v2' */ @@ -340,12 +340,10 @@ static uint32_t fdt_add_gic_node(const VirtBoardInfo *vbi) 2, vbi-memmap[VIRT_GIC_DIST].size, 2, vbi-memmap[VIRT_GIC_CPU].base, 2, vbi-memmap[VIRT_GIC_CPU].size); -qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, gic_phandle); - -return gic_phandle; +qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle); } -static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) +static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) { /* We create a standalone GIC v2 */ DeviceState *gicdev; @@ -394,7 +392,7 @@ static uint32_t create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) pic[i] = qdev_get_gpio_in(gicdev, i); } -return fdt_add_gic_node(vbi); +fdt_add_gic_node(vbi); } static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic) @@ -641,8 +639,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, 0x7 /* PCI irq */); } -static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic, -uint32_t gic_phandle) +static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic) { hwaddr base = vbi-memmap[VIRT_PCIE].base; hwaddr size = vbi-memmap[VIRT_PCIE].size; @@ -714,7 +711,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic, 2, base_mmio, 2, size_mmio); qemu_fdt_setprop_cell(vbi-fdt, nodename, #interrupt-cells, 1); -create_pcie_irq_map(vbi, gic_phandle, irq, nodename); +create_pcie_irq_map(vbi, vbi-gic_phandle, irq, nodename); g_free(nodename); } @@ -736,7 +733,6 @@ static void machvirt_init(MachineState *machine) MemoryRegion *ram = g_new(MemoryRegion, 1); const char *cpu_model = machine-cpu_model; VirtBoardInfo *vbi; -uint32_t gic_phandle; char **cpustr; if (!cpu_model) { @@ -813,13 +809,13 @@ static void machvirt_init(MachineState *machine) create_flash(vbi); -gic_phandle = create_gic(vbi, pic); +create_gic(vbi, pic); create_uart(vbi, pic); create_rtc(vbi, pic); -create_pcie(vbi, pic, gic_phandle); +create_pcie(vbi, pic); /* Create mmio transports, so the user can create virtio backends * (which will be automatically plugged in to the transports). If -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v3 0/4] Add support for for GICv2m and MSIs to arm-virt
Now when we have a host generic PCIe controller in the virt board, it would be nice to be able to use MSIs so that we can eventually enable VHOST with KVM. With these patches you can use MSIs with TCG and with KVM, but you still need some fixes for the mapping of the IRQ index to the GSI number for IRQFD to work. A separate series that enables IRQFD and vhost is available: ARM adaptations for vhost irqfd setup https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01054.html) Tested with KVM on XGene and with TCG by configuring a virtio-pci network adapter for the guest and verifying MSIs going through as expected. See the individual patches for changelogs. Christoffer Dall (3): target-arm: Add GIC phandle to VirtBoardInfo arm_gicv2m: Add GICv2m widget to support MSIs target-arm: Add the GICv2m to the virt board Shanker Donthineni (1): target-arm: Extend the gic node properties hw/arm/virt.c | 81 +++-- hw/intc/Makefile.objs | 1 + hw/intc/arm_gicv2m.c | 190 ++ 3 files changed, 252 insertions(+), 20 deletions(-) create mode 100644 hw/intc/arm_gicv2m.c -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v3 3/4] target-arm: Extend the gic node properties
From: Shanker Donthineni shank...@codeaurora.org In preparation for adding the GICv2m which requires address specifiers and is a subnode of the gic, we extend the gic DT definition to specify the #address-cells and #size-cells properties and add an empty ranges property properties of the DT node, since this is required to add the v2m node as a child of the gic node. Note that we must also expand the irq-map to reference the gic with the right address-cells as a consequnce of this change. Signed-off-by: Shanker Donthineni shank...@codeaurora.org Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v2: - New separate patch factoring out changes to existing code for eased bisectability in case we broke something - The above fixes the issue with non-MSI compatible guests. hw/arm/virt.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index f9f7482..6797c6f 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -340,7 +340,11 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi) 2, vbi-memmap[VIRT_GIC_DIST].size, 2, vbi-memmap[VIRT_GIC_CPU].base, 2, vbi-memmap[VIRT_GIC_CPU].size); +qemu_fdt_setprop_cell(vbi-fdt, /intc, #address-cells, 0x2); +qemu_fdt_setprop_cell(vbi-fdt, /intc, #size-cells, 0x2); +qemu_fdt_setprop(vbi-fdt, /intc, ranges, NULL, 0); qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, vbi-gic_phandle); + } static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) @@ -604,11 +608,12 @@ static void create_fw_cfg(const VirtBoardInfo *vbi) g_free(nodename); } +#define PCIE_IRQMAP_LEN 10 static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, int first_irq, const char *nodename) { int devfn, pin; -uint32_t full_irq_map[4 * 4 * 8] = { 0 }; +uint32_t full_irq_map[4 * 4 * PCIE_IRQMAP_LEN] = { 0 }; uint32_t *irq_map = full_irq_map; for (devfn = 0; devfn = 0x18; devfn += 0x8) { @@ -619,15 +624,15 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle, int i; uint32_t map[] = { -devfn 8, 0, 0, /* devfn */ -pin + 1,/* PCI pin */ -gic_phandle, irq_type, irq_nr, irq_level }; /* GIC irq */ +devfn 8, 0, 0, /* devfn */ +pin + 1, /* PCI pin */ +gic_phandle, 0, 0, irq_type, irq_nr, irq_level }; /* GIC irq */ /* Convert map to big endian */ -for (i = 0; i 8; i++) { +for (i = 0; i PCIE_IRQMAP_LEN; i++) { irq_map[i] = cpu_to_be32(map[i]); } -irq_map += 8; +irq_map += PCIE_IRQMAP_LEN; } } -- 2.1.2.330.g565301e.dirty
[Qemu-devel] [PATCH v3 2/4] arm_gicv2m: Add GICv2m widget to support MSIs
The ARM GICv2m widget is a little device that handles MSI interrupt writes to a trigger register and ties them to a range of interrupt lines wires to the GIC. It has a few status/id registers and the interrupt wires, and that's about it. A board instantiates the device by setting the base SPI number and number SPIs for the frame. The base-spi parameter is indexed in the SPI number space only, so base-spi == 0, means IRQ number 32. When a device (the PCI host controller) writes to the trigger register, the payload is the GIC IRQ number, so we have to subtract 32 from that and then index into our frame of SPIs. When instantiating a GICv2m device, tell PCI that we have instantiated something that can deal with MSIs. We rely on the board actually wiring up the GICv2m to the PCI host controller. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v2: - Renamed QOM type to arm-gicv2m Changes since v1: - Check that writes to MSI_SETSPI are within the lower boundary as well - Move gicv2m to common-obj in Makefile - Separate switch case and comment for impdef regs - Clearly document what is emulated - Allow 16 bit lower accesses to MSI_SETSPI regs - Fix commit grammar error - Remove stray pixman commit hw/intc/Makefile.objs | 1 + hw/intc/arm_gicv2m.c | 190 ++ 2 files changed, 191 insertions(+) create mode 100644 hw/intc/arm_gicv2m.c diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs index 843864a..092d8a8 100644 --- a/hw/intc/Makefile.objs +++ b/hw/intc/Makefile.objs @@ -11,6 +11,7 @@ common-obj-$(CONFIG_SLAVIO) += slavio_intctl.o common-obj-$(CONFIG_IOAPIC) += ioapic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic.o +common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o common-obj-$(CONFIG_OPENPIC) += openpic.o obj-$(CONFIG_APIC) += apic.o apic_common.o diff --git a/hw/intc/arm_gicv2m.c b/hw/intc/arm_gicv2m.c new file mode 100644 index 000..9f84f72 --- /dev/null +++ b/hw/intc/arm_gicv2m.c @@ -0,0 +1,190 @@ +/* + * GICv2m extension for MSI/MSI-x support with a GICv2-based system + * + * Copyright (C) 2015 Linaro, All rights reserved. + * + * Author: Christoffer Dall christoffer.d...@linaro.org + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + */ + +/* This file implements an emulated GICv2m widget as described in the ARM + * Server Base System Architecture (SBSA) specification Version 2.2 + * (ARM-DEN-0029 v2.2) pages 35-39 without any optional implementation defined + * identification registers and with a single non-secure MSI register frame. + */ + +#include hw/sysbus.h +#include hw/pci/msi.h + +#define TYPE_ARM_GICV2M arm-gicv2m +#define ARM_GICV2M(obj) OBJECT_CHECK(ARMGICv2mState, (obj), TYPE_ARM_GICV2M) + +#define GICV2M_NUM_SPI_MAX 128 + +#define V2M_MSI_TYPER 0x008 +#define V2M_MSI_SETSPI_NS 0x040 +#define V2M_MSI_IIDR0xFCC +#define V2M_IIDR0 0xFD0 +#define V2M_IIDR11 0xFFC + +#define PRODUCT_ID_QEMU 0x51 /* ASCII code Q */ + +typedef struct ARMGICv2mState { +SysBusDevice parent_obj; + +MemoryRegion iomem; +qemu_irq spi[GICV2M_NUM_SPI_MAX]; + +uint32_t base_spi; +uint32_t num_spi; +} ARMGICv2mState; + +static void gicv2m_set_irq(void *opaque, int irq) +{ +ARMGICv2mState *s = (ARMGICv2mState *)opaque; + +qemu_irq_pulse(s-spi[irq]); +} + +static uint64_t gicv2m_read(void *opaque, hwaddr offset, +unsigned size) +{ +ARMGICv2mState *s = (ARMGICv2mState *)opaque; +uint32_t val; + +if (size != 4) { +qemu_log_mask(LOG_GUEST_ERROR, gicv2m_read: bad size %u\n, size); +return 0; +} + +switch (offset) { +case V2M_MSI_TYPER: +val = (s-base_spi + 32) 16; +val |= s-num_spi; +return val; +case V2M_MSI_IIDR: +/* We don't have any valid implementor so we leave that field as zero + * and we return 0 in the arch revision as per the spec. + */ +return (PRODUCT_ID_QEMU 20); +case V2M_IIDR0 ... V2M_IIDR11: +/* We do not implement any optional identification registers and the + * mandatory MSI_PIDR2 register reads as 0x0, so we capture all + * implementation defined registers here. + */ +return
[Qemu-devel] [PATCH v3 4/4] target-arm: Add the GICv2m to the virt board
Add a GICv2m device to the virt board to enable MSIs on the generic PCI host controller. We allocate 64 SPIs in the IRQ space for now (this can be increased/decreased later) and map the GICv2m right after the GIC in the memory map. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since v2: - Factored out changes to GIC DT node to previous patch. - Renamed QOM type name to arm-gicv2m Changes since v1: - Remove stray merge conflict line - Reworded commmit message. hw/arm/virt.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 6797c6f..2972bb3 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -45,6 +45,7 @@ #include hw/pci-host/gpex.h #define NUM_VIRTIO_TRANSPORTS 32 +#define NUM_GICV2M_SPIS 64 /* Number of external interrupt lines to configure the GIC with */ #define NUM_IRQS 128 @@ -71,6 +72,7 @@ enum { VIRT_RTC, VIRT_FW_CFG, VIRT_PCIE, +VIRT_GIC_V2M, }; typedef struct MemMapEntry { @@ -88,6 +90,7 @@ typedef struct VirtBoardInfo { int fdt_size; uint32_t clock_phandle; uint32_t gic_phandle; +uint32_t v2m_phandle; } VirtBoardInfo; typedef struct { @@ -127,6 +130,7 @@ static const MemMapEntry a15memmap[] = { /* GIC distributor and CPU interfaces sit inside the CPU peripheral space */ [VIRT_GIC_DIST] = { 0x0800, 0x0001 }, [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, +[VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, [VIRT_UART] = { 0x0900, 0x1000 }, [VIRT_RTC] ={ 0x0901, 0x1000 }, [VIRT_FW_CFG] = { 0x0902, 0x000a }, @@ -148,6 +152,7 @@ static const int a15irqmap[] = { [VIRT_RTC] = 2, [VIRT_PCIE] = 3, /* ... to 6 */ [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */ +[VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */ }; static VirtBoardInfo machines[] = { @@ -323,9 +328,21 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi) } } -static void fdt_add_gic_node(VirtBoardInfo *vbi) +static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi) { +vbi-v2m_phandle = qemu_fdt_alloc_phandle(vbi-fdt); +qemu_fdt_add_subnode(vbi-fdt, /intc/v2m); +qemu_fdt_setprop_string(vbi-fdt, /intc/v2m, compatible, +arm,gic-v2m-frame); +qemu_fdt_setprop(vbi-fdt, /intc/v2m, msi-controller, NULL, 0); +qemu_fdt_setprop_sized_cells(vbi-fdt, /intc/v2m, reg, + 2, vbi-memmap[VIRT_GIC_V2M].base, + 2, vbi-memmap[VIRT_GIC_V2M].size); +qemu_fdt_setprop_cell(vbi-fdt, /intc/v2m, phandle, vbi-v2m_phandle); +} +static void fdt_add_gic_node(VirtBoardInfo *vbi) +{ vbi-gic_phandle = qemu_fdt_alloc_phandle(vbi-fdt); qemu_fdt_setprop_cell(vbi-fdt, /, interrupt-parent, vbi-gic_phandle); @@ -347,6 +364,25 @@ static void fdt_add_gic_node(VirtBoardInfo *vbi) } +static void create_v2m(VirtBoardInfo *vbi, qemu_irq *pic) +{ +int i; +int irq = vbi-irqmap[VIRT_GIC_V2M]; +DeviceState *dev; + +dev = qdev_create(NULL, arm-gicv2m); +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vbi-memmap[VIRT_GIC_V2M].base); +qdev_prop_set_uint32(dev, base-spi, irq); +qdev_prop_set_uint32(dev, num-spi, NUM_GICV2M_SPIS); +qdev_init_nofail(dev); + +for (i = 0; i NUM_GICV2M_SPIS; i++) { +sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]); +} + +fdt_add_v2m_gic_node(vbi); +} + static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) { /* We create a standalone GIC v2 */ @@ -397,6 +433,8 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) } fdt_add_gic_node(vbi); + +create_v2m(vbi, pic); } static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic) @@ -707,6 +745,8 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic) qemu_fdt_setprop_cells(vbi-fdt, nodename, bus-range, 0, nr_pcie_buses - 1); +qemu_fdt_setprop_cells(vbi-fdt, nodename, msi-parent, vbi-v2m_phandle); + qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, reg, 2, base_ecam, 2, size_ecam); qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, ranges, -- 2.1.2.330.g565301e.dirty
Re: [Qemu-devel] [PATCH v2 0/3] Add support for for GICv2m and MSIs to arm-virt
On Wed, May 06, 2015 at 05:39:28PM +0100, Peter Maydell wrote: On 6 May 2015 at 17:33, Peter Maydell peter.mayd...@linaro.org wrote: On 27 April 2015 at 18:31, Christoffer Dall christoffer.d...@linaro.org wrote: Now when we have a host generic PCIe controller in the virt board, it would be nice to be able to use MSIs so that we can eventually enable VHOST with KVM. With these patches you can use MSIs with TCG and with KVM, but you still need some fixes for the mapping of the IRQ index to the GSI number for IRQFD to work. A separate series that enables IRQFD and vhost is available: ARM adaptations for vhost irqfd setup https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01054.html) Tested with KVM on XGene and with TCG by configuring a virtio-pci network adapter for the guest and verifying MSIs going through as expected. You forgot to change the QOM device name to 'arm-gicv2m', but I'll fix that up as I apply this to target-arm.next. ...except this series breaks booting of a linux guest using PCI on the virt board with aarch32: PCI host bridge /pcie@1000 ranges: IO 0x3eff..0x3eff - 0x MEM 0x1000..0x3efe - 0x1000 pci-host-generic 3f00.pcie: PCI host bridge to bus :00 pci_bus :00: root bus resource [bus 00-0f] pci_bus :00: root bus resource [io 0x-0x] pci_bus :00: root bus resource [mem 0x1000-0x3efe] PCI: bus0: Fast back to back transfers disabled pci :00:01.0: of_irq_parse_pci() failed with rc=-22 pci :00:02.0: of_irq_parse_pci() failed with rc=-22 pci :00:02.0: BAR 6: assigned [mem 0x1000-0x1003 pref] pci :00:01.0: BAR 1: assigned [mem 0x1004-0x10040fff] pci :00:02.0: BAR 1: assigned [mem 0x10041000-0x10041fff] pci :00:01.0: BAR 0: assigned [io 0x1000-0x103f] pci :00:02.0: BAR 0: assigned [io 0x1040-0x105f] virtio-pci :00:01.0: enabling device (0100 - 0103) virtio-pci :00:02.0: enabling device (0100 - 0103) virtio_blk: probe of virtio32 failed with error -22 virtio_net: probe of virtio33 failed with error -22 (and without virtio-blk we don't mount our rootfs). Shanker figured out that this was due to me changing address-cells and size-cells in the gic node and breaking the irq-map in the DT and provided a fix. I will send a new series. Thanks, -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Tue, May 19, 2015 at 12:18:54PM +0100, Catalin Marinas wrote: On Tue, May 19, 2015 at 11:03:22AM +0100, Andrew Jones wrote: On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote: Another way would be to split the vma containing the non-cacheable memory so that you get a single vma with the vm_page_prot as Non-cacheable. This sounds interesting. Actually, it even crossed my mind once when I first saw that the vma would overwrite the attributes, but then, sigh, I let my brain take a stupidity bath. Yet another approach could be for KVM to mmap the necessary memory for Qemu via a file_operations.mmap call (but that's only for ranges outside the guest RAM). I guess I prefer the vma splitting, rather than this (the vma creating with mmap), as it keeps the KVM interface from changing (as you point out below). Well, unless there are other advantages to this that are worth considering? The advantage is that you don't need to deal with the mm internals in the KVM code. But you can probably add such code directly to mm/ and reuse some of the existing code in there already as part of change_protection(), mprotect_fixup(), sys_mprotect(). Actually, once you split the vma and set the new protection (something similar to mprotect_fixup), it looks to me like you can just call change_protection(vma-vm_page_prot). I didn't have time to follow these threads in details, but just to recap my understanding, we have two main use-cases: 1. Qemu handling guest I/O to device (e.g. PCIe BARs) 2. Qemu emulating device DMA For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about this memory slot. The memory attributes in this case could be Device because that's how the guest would normally map it. The file_operations.mmap trick would work in this case but this means expanding the KVM ABI beyond just an ioctl(). For (2), since Qemu is writing to the guest RAM (e.g. video framebuffer allocated by the guest), I still think the simplest is to tell the guest (via DT) that such device is cache coherent rather than trying to remap the Qemu mapping as non-cacheable. If we need a solution for (1), then I'd prefer that it work and be applied to (2) as well. Anyway, I'm still not 100% sure we can count on all guest types (booloaders, different OSes) to listen to us. They may assume non-cacheable is typical and safe, and thus just do that always. We can certainly change some of those bootloaders and OSes, but probably not all of them. That's fine by me. Once you get the vma splitting and attributes changing done, I think you get the second one for free. Do we want to differentiate between Device and Normal Non-cacheable memory? Something like KVM_MEMSLOT_DEVICE? Nitpick: I'm not sure whether uncached is clear enough. In Linux, pgprot_noncached() returns Strongly Ordered memory. For Normal Non-cachable we used pgprot_writecombine (e.g. a video framebuffer). Maybe something like KVM_MEMSLOT_COHERENT meaning a request to KVM to ensure that guest and host access it coherently (which would mean writecombine for ARM). That's similar naming to functions like dma_alloc_coherent() that return cacheable or non-cacheable memory based on what the device supports. Anyway, I'm not to bothered with the naming. One thing to keep in mind for (2) is that QEMU is likely to do things like calling regular memcpy() on the memory region, so mapping it as device memory which would fault on unaligned accesses may be a problem, so ideally there is a memory type for the user space mapping which allows such behavior where we at the same time can guarantee the that the mapping is coherent with the guest mapping through the S2 attributes. -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 03:36:37PM +0200, Andrew Jones wrote: On Thu, May 14, 2015 at 02:11:59PM +0100, Peter Maydell wrote: On 14 May 2015 at 14:03, Andrew Jones drjo...@redhat.com wrote: On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote: On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote: Forgot to (4): switch from setting userspace's mapping to device memory to normal, non-cacheable. Using device memory caused a problem that Alex Graf found, and Peter Maydell suggested using normal, non-cacheable instead. Did you check that non-cacheable is definitely the correct kind of Normal memory attribute we want? (ie not write-through). I was concerned that write-through wouldn't be sufficient. If the guest writes to its non-cached memory, and QEMU needs to see what it wrote, then won't write-through fail to work? Unless we some how invalidate the cache first? Well, I meant more that the correct mapping for userspace is the same as the guest, whatever that is, and so somebody needs to look at what the guest actually does rather than merely hoping NormalNC is OK. (For instance, do we need to provide support for QEMU to map both NC and writethrough?) Ah, we assume the guest is mapping it as device memory, and in this version of the series, I ensure that it is at least NC with the S2 attributes. I don't think we can look at what some guests do with some devices to come up with anything beyond (poor?) heuristics. I prefer that we force both the guest and QEMU to NC (or guest chooses Device and QEMU is forced to NC) to make sure we get it right. But picking up on Peter's feedback I think it would be good if the series clearly states something like: 1) We assume that the guest may use device type memory for the accesses 2) we cannot use device memory for the userspace mapping because userspace may be doing unaligned accesses to it 3) normal non-cacheable bridges these worlds becauase of x, y, and z. I assume x, y, and z would include a fairly involved discussion of the interesting aspects of how you can configure memory accesses on ARM ... :) -Christoffer
Re: [Qemu-devel] [RFC] ARM/ARM64: KVM: Implement KVM_FLUSH_DCACHE_GPA ioctl
On Fri, May 15, 2015 at 01:43:57PM +0200, Laszlo Ersek wrote: On 05/07/15 19:01, Paolo Bonzini wrote: On 07/05/2015 18:56, Jérémy Fanguède wrote: USB devices fail with a timeout error, as if the communication between the kernel and the devices fail at a certain point: usb 1-1: device not accepting address 5, error -110 usb usb1-port1: unable to enumerate USB device This is consistent with what I saw in my earlier testing. e1000 fails when the userspace tries to use it, with these type of kernel messages: e1000 :00:02.0 eth0: Detected Tx Unit Hang Tx Queue 0 TDH d TDT d next_to_use d next_to_clean9 buffer_info[next_to_clean] time_stamp 9311 next_to_watcha jiffies 956a next_to_watch.status 0 Can you find out what memory attributes the guest is using for the memory---and if it's uncached, why? For USB, see drivers/usb/core/hcd-pci.c, function usb_hcd_pci_probe(): it uses ioremap_nocache(). On the why, that ioremap_nocache() call can be tracked to http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=a914dd8b (Feb 2002), which predates the kernel's move to git. I guess ioremap_nocache() is used simply because USB host controllers are supposed to programmed like that. And, from arch/arm64/include/asm/io.h: #define ioremap_nocache(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) So this just means that these devices should be mapped as device memory (like the VGA case before) right? And therefore should work with Drew's patches (assuming they are actually correct and you add the right QEMU annotations to set the memory regions and non-cacheable), correct? Thanks, -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency
On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote: On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote: On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote: When S1 and S2 memory attributes combine wrt to caching policy, non-cacheable types take precedence. If a guest maps a region as device memory, which KVM userspace is using to emulate the device using normal, cacheable memory, then we lose coherency. With KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory regions are likely to be problematic. With this patch, as pages of these types of regions are faulted into the guest, not only do we flush the page's dcache, but we also change userspace's mapping to NC in order to maintain coherency. What if the guest doesn't do what we expect? While we can't force a guest to use cacheable memory, we can take advantage of the non-cacheable precedence, and force it to use non-cacheable. So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on KVM_MEM_UNCACHED regions to force them to NC. We now have both guest and userspace on the same page (pun intended) I'd like to revisit the overall approach here. Is doing non-cached accesses in both the guest and host really the right thing to do here? I think so, but all ideas/approaches are still on the table. This is still an RFC. The semantics of the device becomes that it is cache coherent (because QEMU is), and I think Marc argued that Linux/UEFI should simply be adapted to handle whatever emulated devices we have as coherent. I also remember someone arguing that would be wrong (Peter?). I'm not really for quirking all devices in all guest types (AAVMF, Linux, other bootloaders, other OSes). Windows is unlikely to apply any quirks. Well my point was that if we're emulating a platform with coherent IO memory for PCI devices that is something that the guest should work with as such, but as Paolo explained it should always be safe for a guest to assume non-coherent, so that doesn't work. Finally, does this address all cache coherency issues with emulated devices? Some VOS guys had seen things still not working with this approach, unsure why... I'd like to avoid us merging this only to merge a more complete solution in a few weeks which reverts this solution... I'm not sure (this is still an RFT too :-) We definitely would need to scatter some more memory_region_set_uncached() calls around QEMU first. It would be good if you could sync with the VOS guys and make sure your patch set addresses their issues with the appropriate memory_region_set_uncached() added to QEMU, and if it does not, some vague idea why that falls outside of the scope of this patch set. After all, adding a USB controller to a VM is not that an esoteric use case, is it? More comments/questions below: Signed-off-by: Andrew Jones drjo...@redhat.com --- arch/arm/include/asm/kvm_mmu.h| 5 - arch/arm/include/asm/pgtable-3level.h | 1 + arch/arm/include/asm/pgtable.h| 1 + arch/arm/kvm/mmu.c| 37 +++ arch/arm64/include/asm/kvm_mmu.h | 5 - arch/arm64/include/asm/memory.h | 1 + arch/arm64/include/asm/pgtable.h | 1 + 7 files changed, 36 insertions(+), 15 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 405aa18833073..e8034a80b12e5 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn, while (size) { void *va = kmap_atomic_pfn(pfn); - if (need_flush) + if (need_flush) { kvm_flush_dcache_to_poc(va, PAGE_SIZE); + if (ipa_uncached) + set_memory_nc((unsigned long)va, 1); nit: consider moving this outside the need_flush + } if (icache_is_pipt()) __cpuc_coherent_user_range((unsigned long)va, diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index a745a2a53853c..39b3f7a40e663 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -121,6 +121,7 @@ * 2nd stage PTE definitions for LPAE. */ #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) 2) /* strongly ordered */ +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5) 2) /* normal non-cacheable */ #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) 2) /* normal inner write-through */ #define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf) 2) /* normal inner write-back */ #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1
Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Thu, May 14, 2015 at 03:46:44PM +0200, Andrew Jones wrote: On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote: On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote: Provide a method to change normal, cacheable memory to non-cacheable. KVM will make use of this to keep emulated device memory regions coherent with the guest. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org But you obviously need Russell and Will/Catalin to ack/merge this. I guess this patch is going to go away in the next round. You've pointed out that I screwed stuff up royally with my over eagerness to reuse code. I need to reimplement change_memory_common, but a version that takes an mm, which is more or less what I did in the last version of this series, back when I was pinning pages. Yeah, I just read this one before looking at the others because it was a simple one... -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:00, Christoffer Dall wrote: So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? Well, we could use the subsystem id. But it's a hack, and may cause incompatibilities with some drivers. Michael, any ideas? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? I think that was rejected before. Do you remember where? I just remember Catalin mentioning the idea to me verbally. Besides the slightly heavy added use of resources etc. it seems like it would address some of our issues in a good way. But I'm still not sure why UEFI/Linux currently sees our PCI bus as being non-coherent when in fact it is and we have no passthrough issues currently. Are all PCI controllers always non-coherent for some reason and therefore we model it as such too? -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:38:38PM +0200, Paolo Bonzini wrote: On 14/05/2015 13:36, Christoffer Dall wrote: (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? Only until device assignment gets into the picture. Will UEFI have to deal with device assignment in any respect? Why not? For example you could do network boot from an assigned network card. In fact, anything that UEFI has to deal with, the OS has to deal with too. If you need a UEFI hack, chances are you need or will need a Linux hack too. Fair enough. I was thinking that UEFI needs to be built with knowledge of all the hardware present including any passthrough devices, but I guess this is plainly not true with PCI (and might not even be true with the level of DT parsing we do for the virtual platform). So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 02:28:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:24, Christoffer Dall wrote: On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:00, Christoffer Dall wrote: So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? Well, we could use the subsystem id. But it's a hack, and may cause incompatibilities with some drivers. Michael, any ideas? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? I think that was rejected before. Do you remember where? I just remember Catalin mentioning the idea to me verbally. In the last centithread on the subject. :) At least I and Peter disagreed. It's not about the heavy added use of resources, it's more about it being really easy to misconfigure. But I'm still not sure why UEFI/Linux currently sees our PCI bus as being non-coherent when in fact it is and we have no passthrough issues currently. Are all PCI controllers always non-coherent for some reason and therefore we model it as such too? Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. ok, I guess this series makes sense then, assuming it works, and assuming we don't kill performance by going to RAM all the time when we don't have to... Thanks, -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:09:34PM +0200, Laszlo Ersek wrote: On 05/14/15 12:30, Christoffer Dall wrote: On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote: Introduce a new memory region flag, KVM_MEM_UNCACHED, which is needed by ARM. This flag informs KVM that the given memory region is typically mapped by the guest as non-cacheable. KVM for ARM then ensures that that memory is indeed mapped non-cacheable by the guest, and also remaps that region as non-cacheable for userspace, allowing them both to maintain a coherent view. Changes since v1: 1) don't pin pages [Paolo] 2) ensure the guest maps the memory non-cacheable [me] 3) clean up memslot flag documentation [Christoffer] changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here http://www.spinics.net/lists/kvm-arm/msg14022.html The QEMU series for v1 hasn't really changed. Only the linux header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to 116. Find the series here http://www.spinics.net/lists/kvm-arm/msg14026.html Testing: This series still needs lots of testing, but I thought I'd kick it to the list early, as there's been recent interest in solving this problem, and I'd like to get test results and opinions on this approach from others sooner than later. I've tested with AAVMF (UEFI for AArch64 mach-virt guests). AAVMF has a kludge in it to avoid the coherency problem. How does the 'kludge' work? https://github.com/tianocore/edk2/commit/f9a8be42 (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:31:03PM +0200, Paolo Bonzini wrote: On 14/05/2015 13:29, Christoffer Dall wrote: (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? Only until device assignment gets into the picture. Will UEFI have to deal with device assignment in any respect? -Christoffer
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote: Introduce a new memory region flag, KVM_MEM_UNCACHED, which is needed by ARM. This flag informs KVM that the given memory region is typically mapped by the guest as non-cacheable. KVM for ARM then ensures that that memory is indeed mapped non-cacheable by the guest, and also remaps that region as non-cacheable for userspace, allowing them both to maintain a coherent view. Changes since v1: 1) don't pin pages [Paolo] 2) ensure the guest maps the memory non-cacheable [me] 3) clean up memslot flag documentation [Christoffer] changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here http://www.spinics.net/lists/kvm-arm/msg14022.html The QEMU series for v1 hasn't really changed. Only the linux header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to 116. Find the series here http://www.spinics.net/lists/kvm-arm/msg14026.html Testing: This series still needs lots of testing, but I thought I'd kick it to the list early, as there's been recent interest in solving this problem, and I'd like to get test results and opinions on this approach from others sooner than later. I've tested with AAVMF (UEFI for AArch64 mach-virt guests). AAVMF has a kludge in it to avoid the coherency problem. How does the 'kludge' work? I've tested both with and without that kludge active. Both worked for me (almost). Sometimes with the non-kludged version I was still able to see a bit of corruption in grub's output after edk2 loaded it - not much, and not always, but something. Remind me, this is a VGA framebuffer corruption with a PCI-plugged VGA card? Thanks, -Christoffer Anyway, it's quite frustrating, as I'm not sure what I'm missing... This series applies to Linus' 110bc76729d4, but I tested with a version backported to the current RHELSA kernel. Thanks for reviews and testing! drew Andrew Jones (3): arm/arm64: pageattr: add set_memory_nc KVM: promote KVM_MEMSLOT_INCOHERENT to uapi arm/arm64: KVM: implement 'uncached' mem coherency Documentation/virtual/kvm/api.txt | 20 -- arch/arm/include/asm/cacheflush.h | 1 + arch/arm/include/asm/kvm_mmu.h| 5 - arch/arm/include/asm/pgtable-3level.h | 1 + arch/arm/include/asm/pgtable.h| 1 + arch/arm/include/uapi/asm/kvm.h | 1 + arch/arm/kvm/arm.c| 1 + arch/arm/kvm/mmu.c| 39 ++- arch/arm/mm/pageattr.c| 7 +++ arch/arm64/include/asm/cacheflush.h | 1 + arch/arm64/include/asm/kvm_mmu.h | 5 - arch/arm64/include/asm/memory.h | 1 + arch/arm64/include/asm/pgtable.h | 1 + arch/arm64/include/uapi/asm/kvm.h | 1 + arch/arm64/mm/pageattr.c | 8 +++ include/linux/kvm_host.h | 1 - include/uapi/linux/kvm.h | 2 ++ virt/kvm/kvm_main.c | 7 ++- 18 files changed, 79 insertions(+), 24 deletions(-) -- 2.1.0
Re: [Qemu-devel] [RFC/RFT PATCH v2 2/3] KVM: promote KVM_MEMSLOT_INCOHERENT to uapi
On Wed, May 13, 2015 at 01:31:53PM +0200, Andrew Jones wrote: Commit 1050dcda30529 introduced KVM_MEMSLOT_INCOHERENT to flag memory regions that may have coherency issues due to mapping host system RAM as non-cacheable. This was introduced as a KVM internal flag, but now give KVM userspace access to it so that it may use it for hinting likely problematic regions. Also rename to KVM_MEM_UNCACHED. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency
On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote: When S1 and S2 memory attributes combine wrt to caching policy, non-cacheable types take precedence. If a guest maps a region as device memory, which KVM userspace is using to emulate the device using normal, cacheable memory, then we lose coherency. With KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory regions are likely to be problematic. With this patch, as pages of these types of regions are faulted into the guest, not only do we flush the page's dcache, but we also change userspace's mapping to NC in order to maintain coherency. What if the guest doesn't do what we expect? While we can't force a guest to use cacheable memory, we can take advantage of the non-cacheable precedence, and force it to use non-cacheable. So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on KVM_MEM_UNCACHED regions to force them to NC. We now have both guest and userspace on the same page (pun intended) I'd like to revisit the overall approach here. Is doing non-cached accesses in both the guest and host really the right thing to do here? The semantics of the device becomes that it is cache coherent (because QEMU is), and I think Marc argued that Linux/UEFI should simply be adapted to handle whatever emulated devices we have as coherent. I also remember someone arguing that would be wrong (Peter?). Finally, does this address all cache coherency issues with emulated devices? Some VOS guys had seen things still not working with this approach, unsure why... I'd like to avoid us merging this only to merge a more complete solution in a few weeks which reverts this solution... More comments/questions below: Signed-off-by: Andrew Jones drjo...@redhat.com --- arch/arm/include/asm/kvm_mmu.h| 5 - arch/arm/include/asm/pgtable-3level.h | 1 + arch/arm/include/asm/pgtable.h| 1 + arch/arm/kvm/mmu.c| 37 +++ arch/arm64/include/asm/kvm_mmu.h | 5 - arch/arm64/include/asm/memory.h | 1 + arch/arm64/include/asm/pgtable.h | 1 + 7 files changed, 36 insertions(+), 15 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 405aa18833073..e8034a80b12e5 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn, while (size) { void *va = kmap_atomic_pfn(pfn); - if (need_flush) + if (need_flush) { kvm_flush_dcache_to_poc(va, PAGE_SIZE); + if (ipa_uncached) + set_memory_nc((unsigned long)va, 1); nit: consider moving this outside the need_flush + } if (icache_is_pipt()) __cpuc_coherent_user_range((unsigned long)va, diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index a745a2a53853c..39b3f7a40e663 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -121,6 +121,7 @@ * 2nd stage PTE definitions for LPAE. */ #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) 2) /* strongly ordered */ +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5) 2) /* normal non-cacheable */ #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) 2) /* normal inner write-through */ #define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf) 2) /* normal inner write-back */ #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) 2) /* device */ diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index f40354198bad4..ae13ca8b0a23d 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -100,6 +100,7 @@ extern pgprot_t pgprot_s2_device; #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP) #define PAGE_HYP_DEVICE _MOD_PROT(pgprot_hyp_device, L_PTE_HYP) #define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY) +#define PAGE_S2_NORMAL_NC__pgprot((pgprot_val(PAGE_S2) ~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC) #define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY) #define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index bc1665acd73e7..6b3bd8061bd2a 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct vm_area_struct *vma; pfn_t pfn; pgprot_t mem_type = PAGE_S2; - bool fault_ipa_uncached; + bool fault_ipa_uncached = false; bool logging_active =
Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote: Provide a method to change normal, cacheable memory to non-cacheable. KVM will make use of this to keep emulated device memory regions coherent with the guest. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org But you obviously need Russell and Will/Catalin to ack/merge this. -Christoffer