Re: [PATCH 0/9] arm64: Stolen time support
Hi Steven, On 2020/7/27 18:48, Steven Price wrote: > On 21/07/2020 04:26, zhukeqian wrote: >> Hi Steven, > > Hi Keqian, > >> On 2019/8/2 22:50, Steven Price wrote: >>> This series add support for paravirtualized time for arm64 guests and >>> KVM hosts following the specification in Arm's document DEN 0057A: >>> >>> https://developer.arm.com/docs/den0057/a >>> >>> It implements support for stolen time, allowing the guest to >>> identify time when it is forcibly not executing. >>> >>> It doesn't implement support for Live Physical Time (LPT) as there are >>> some concerns about the overheads and approach in the above >> Do you plan to pick up LPT support? As there is demand of cross-frequency >> migration >> (from older platform to newer platform). > > I don't have any plans to pick up the LPT support at the moment - feel free > to pick it up! ;) > >> I am not clear about the overheads and approach problem here, could you >> please >> give some detail information? Maybe we can work together to solve these >> concerns. :-) > > Fundamentally the issue here is that LPT only solves one small part of > migration between different hosts. To successfully migrate between hosts with > different CPU implementations it is also necessary to be able to virtualise > various ID registers (e.g. MIDR_EL1, REVIDR_EL1, AIDR_EL1) which we have no > support for currently. > Yeah, currently we are trying to do both timer freq virtualization and CPU feature virtualization. > The problem with just virtualising the registers is how you handle errata. > The guest will currently use those (and other) ID registers to decide whether > to enable specific errata workarounds. But what errata should be enabled for > a guest which might migrate to another host? > Thanks for pointing this out. I think the most important thing is that we should introduce a concept named CPU baseline which represents a standard platform. If we bring up a guest with a specific CPU baseline, then this guest can only run on a platform that is compatible with this CPU baseline. So "baseline" and "compatible" are the key point to promise successful cross-platform migration. > What we ideally need is a mechanism to communicate to the guest what > workarounds are required to successfully run on any of the hosts that the > guest may be migrated to. You may also have the situation where the > workarounds required for two hosts are mutually incompatible - something > needs to understand this and do the "right thing" (most likely just reject > this situation, i.e. prevent the migration). > > There are various options here: e.g. a para-virtualised interface to describe > the workarounds (but this is hard to do in an OS-agnostic way), or virtual-ID > registers describing an idealised environment where no workarounds are > required (and only hosts that have no errata affecting a guest would be able > to provide this). > My idea is similar with the "idealised environment", but errata workaround still exists. We do not provide para-virtualised interface, and migration is restricted between platforms that are compatible with baseline. Baseline should has two aspects: CPU feature and errata. These platforms that are compatible with a specific baseline should have the corresponding CPU feature and errata. > Given the above complexity and the fact that Armv8.6-A standardises the > frequency to 1GHz this didn't seem worth continuing with. So LPT was dropped > from the spec and patches to avoid holding up the stolen time support. > > However, if you have a use case which doesn't require such a generic > migration (e.g. perhaps old and new platforms are based on the same IP) then > it might be worth looking at bring this back. But to make the problem > solvable it either needs to be restricted to platforms which are > substantially the same (so the errata list will be identical), or there's > work to be done in preparation to deal with migrating a guest successfully > between hosts with potentially different errata requirements. > > Can you share more details about the hosts that you are interested in > migrating between? Here we have new platform with 1GHz timer, and old platform is 100MHZ, so we want to solve the cross-platform migration firstly. Thanks, Keqian > > Thanks, > > Steve > . > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH kvm-unit-tests] arm64: Compile with -mno-outline-atomics
On 28/07/20 14:17, Andrew Jones wrote: > GCC 10.1.0 introduced the -moutline-atomics option which, when > enabled, use LSE instructions when the processor provides them. > The option is enabled by default and unfortunately causes the > following error at compile time: > > aarch64-linux-gnu-ld: > /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a(lse-init.o): in function > `init_have_lse_atomics': > lse-init.c:(.text.startup+0xc): undefined reference to `__getauxval' > > This is happening because we are linking against our own libcflat which > doesn't implement the function __getauxval(). > > Disable the use of the out-of-line functions by compiling with > -mno-outline-atomics. > > Reported-by: Alexandru Elisei > Tested-by: Alexandru Elisei > Signed-off-by: Andrew Jones > --- > Makefile | 11 +-- > arm/Makefile.arm64 | 3 +++ > 2 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/Makefile b/Makefile > index 3ff2f91600f6..0e21a49096ba 100644 > --- a/Makefile > +++ b/Makefile > @@ -17,6 +17,11 @@ DESTDIR := $(PREFIX)/share/kvm-unit-tests/ > > .PHONY: arch_clean clean distclean cscope > > +# cc-option > +# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, > -malign-functions=0) > +cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \ > + > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;) > + > #make sure env CFLAGS variable is not used > CFLAGS = > > @@ -43,12 +48,6 @@ OBJDIRS += $(LIBFDT_objdir) > #include architecture specific make rules > include $(SRCDIR)/$(TEST_DIR)/Makefile > > -# cc-option > -# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, > -malign-functions=0) > - > -cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \ > - > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;) > - > COMMON_CFLAGS += -g $(autodepend-flags) -fno-strict-aliasing -fno-common > COMMON_CFLAGS += -Wall -Wwrite-strings -Wempty-body -Wuninitialized > COMMON_CFLAGS += -Wignored-qualifiers -Werror > diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64 > index dfd0c56fe8fb..dbc7524d3070 100644 > --- a/arm/Makefile.arm64 > +++ b/arm/Makefile.arm64 > @@ -9,6 +9,9 @@ ldarch = elf64-littleaarch64 > arch_LDFLAGS = -pie -n > CFLAGS += -mstrict-align > > +mno_outline_atomics := $(call cc-option, -mno-outline-atomics, "") > +CFLAGS += $(mno_outline_atomics) > + > define arch_elf_check = > $(if $(shell ! $(OBJDUMP) -R $(1) >&/dev/null && echo "nok"), > $(error $(shell $(OBJDUMP) -R $(1) 2>&1))) > Queued, thanks. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: pvtime: steal-time is only supported when configured
On Tue, Jul 28, 2020 at 02:13:54PM +0100, Marc Zyngier wrote: > On 2020-07-28 13:55, Andrew Jones wrote: > > On Mon, Jul 27, 2020 at 06:25:50PM +0100, Marc Zyngier wrote: > > > Hi Andrew, > > > > > > On 2020-07-11 11:04, Andrew Jones wrote: > > > > Don't confuse the guest by saying steal-time is supported when > > > > it hasn't been configured by userspace and won't work. > > > > > > > > Signed-off-by: Andrew Jones > > > > --- > > > > arch/arm64/kvm/pvtime.c | 5 - > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c > > > > index f7b52ce1557e..2b22214909be 100644 > > > > --- a/arch/arm64/kvm/pvtime.c > > > > +++ b/arch/arm64/kvm/pvtime.c > > > > @@ -42,9 +42,12 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu) > > > > > > > > switch (feature) { > > > > case ARM_SMCCC_HV_PV_TIME_FEATURES: > > > > - case ARM_SMCCC_HV_PV_TIME_ST: > > > > val = SMCCC_RET_SUCCESS; > > > > break; > > > > + case ARM_SMCCC_HV_PV_TIME_ST: > > > > + if (vcpu->arch.steal.base != GPA_INVALID) > > > > + val = SMCCC_RET_SUCCESS; > > > > + break; > > > > } > > > > > > > > return val; > > > > > > I'm not so sure about this. I have always considered the > > > discovery interface to be "do you know about this SMCCC > > > function". And if you look at the spec, it says (4.2, > > > PV_TIME_FEATURES): > > > > > > > > > If PV_call_id identifies PV_TIME_FEATURES, this call returns: > > > • NOT_SUPPORTED (-1) to indicate that all > > > paravirtualized time functions in this specification are not > > > supported. > > > • SUCCESS (0) to indicate that all the paravirtualized time > > > functions in this specification are supported. > > > > > > > > > So the way I understand it, you cannot return "supported" > > > for PV_TIME_FEATURES, and yet return NOT_SUPPORTED for > > > PV_TIME_ST. It applies to *all* features. > > > > > > Yes, this is very bizarre. But I don't think we can deviate > > > from it. > > > > Ah, I see your point. But I wonder if we should drop this patch > > or if we should change the return of ARM_SMCCC_HV_PV_TIME_FEATURES > > to be dependant on all the pv calls? > > > > Discovery would look like this > > > > IF (SMCCC_ARCH_FEATURES, PV_TIME_FEATURES) == 0; THEN > > IF (PV_TIME_FEATURES, PV_TIME_FEATURES) == 0; THEN > > PV_TIME_ST is supported, as well as all other PV calls > > ELIF (PV_TIME_FEATURES, PV_TIME_ST) == 0; THEN > > PV_TIME_ST is supported > > ELIF (PV_TIME_FEATURES, ) == 0; THEN > > is supported > > ... > > ENDIF > > ELSE > > No PV calls are supported > > ENDIF > > > > I believe the above implements a reasonable interpretation of the > > specification, but the all feature (PV_TIME_FEATURES, PV_TIME_FEATURES) > > thing is indeed bizarre no matter how you look at it. > > It it indeed true to the spec. Thankfully we only support PV_TIME > as a feature for now, so we are (sort of) immune to the braindead > aspect of the discovery protocol. > > I think returning a failure when PV_TIME isn't setup is a valid thing > to do, as long as it applies to all functions (i.e. something like > the below patch). > > Thanks, > > M. > > diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c > index f7b52ce1557e..c3ef4ebd6846 100644 > --- a/arch/arm64/kvm/pvtime.c > +++ b/arch/arm64/kvm/pvtime.c > @@ -43,7 +43,8 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu) > switch (feature) { > case ARM_SMCCC_HV_PV_TIME_FEATURES: > case ARM_SMCCC_HV_PV_TIME_ST: > - val = SMCCC_RET_SUCCESS; > + if (vcpu->arch.steal.base != GPA_INVALID) > + val = SMCCC_RET_SUCCESS; > break; > } Looks good to me. I'll do that for v2. Thanks, drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: pvtime: steal-time is only supported when configured
On 2020-07-28 13:55, Andrew Jones wrote: On Mon, Jul 27, 2020 at 06:25:50PM +0100, Marc Zyngier wrote: Hi Andrew, On 2020-07-11 11:04, Andrew Jones wrote: > Don't confuse the guest by saying steal-time is supported when > it hasn't been configured by userspace and won't work. > > Signed-off-by: Andrew Jones > --- > arch/arm64/kvm/pvtime.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c > index f7b52ce1557e..2b22214909be 100644 > --- a/arch/arm64/kvm/pvtime.c > +++ b/arch/arm64/kvm/pvtime.c > @@ -42,9 +42,12 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu) > >switch (feature) { >case ARM_SMCCC_HV_PV_TIME_FEATURES: > - case ARM_SMCCC_HV_PV_TIME_ST: >val = SMCCC_RET_SUCCESS; >break; > + case ARM_SMCCC_HV_PV_TIME_ST: > + if (vcpu->arch.steal.base != GPA_INVALID) > + val = SMCCC_RET_SUCCESS; > + break; >} > >return val; I'm not so sure about this. I have always considered the discovery interface to be "do you know about this SMCCC function". And if you look at the spec, it says (4.2, PV_TIME_FEATURES): If PV_call_id identifies PV_TIME_FEATURES, this call returns: • NOT_SUPPORTED (-1) to indicate that all paravirtualized time functions in this specification are not supported. • SUCCESS (0) to indicate that all the paravirtualized time functions in this specification are supported. So the way I understand it, you cannot return "supported" for PV_TIME_FEATURES, and yet return NOT_SUPPORTED for PV_TIME_ST. It applies to *all* features. Yes, this is very bizarre. But I don't think we can deviate from it. Ah, I see your point. But I wonder if we should drop this patch or if we should change the return of ARM_SMCCC_HV_PV_TIME_FEATURES to be dependant on all the pv calls? Discovery would look like this IF (SMCCC_ARCH_FEATURES, PV_TIME_FEATURES) == 0; THEN IF (PV_TIME_FEATURES, PV_TIME_FEATURES) == 0; THEN PV_TIME_ST is supported, as well as all other PV calls ELIF (PV_TIME_FEATURES, PV_TIME_ST) == 0; THEN PV_TIME_ST is supported ELIF (PV_TIME_FEATURES, ) == 0; THEN is supported ... ENDIF ELSE No PV calls are supported ENDIF I believe the above implements a reasonable interpretation of the specification, but the all feature (PV_TIME_FEATURES, PV_TIME_FEATURES) thing is indeed bizarre no matter how you look at it. It it indeed true to the spec. Thankfully we only support PV_TIME as a feature for now, so we are (sort of) immune to the braindead aspect of the discovery protocol. I think returning a failure when PV_TIME isn't setup is a valid thing to do, as long as it applies to all functions (i.e. something like the below patch). Thanks, M. diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c index f7b52ce1557e..c3ef4ebd6846 100644 --- a/arch/arm64/kvm/pvtime.c +++ b/arch/arm64/kvm/pvtime.c @@ -43,7 +43,8 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu) switch (feature) { case ARM_SMCCC_HV_PV_TIME_FEATURES: case ARM_SMCCC_HV_PV_TIME_ST: - val = SMCCC_RET_SUCCESS; + if (vcpu->arch.steal.base != GPA_INVALID) + val = SMCCC_RET_SUCCESS; break; } -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/5] KVM: arm64: pvtime: Fixes and a new cap
On Mon, Jul 27, 2020 at 07:01:04PM +0100, Marc Zyngier wrote: > On 2020-07-11 11:04, Andrew Jones wrote: > > The first three patches in the series are fixes that come from testing > > and reviewing pvtime code while writing the QEMU support (I'll reply > > to this mail with a link to the QEMU patches after posting - which I'll > > do shortly). The last patch is only a convenience for userspace, and I > > wouldn't be heartbroken if it wasn't deemed worth it. The QEMU patches > > I'll be posting are currently written without the cap. However, if the > > cap is accepted, then I'll change the QEMU code to use it. > > > > Thanks, > > drew > > > > Andrew Jones (5): > > KVM: arm64: pvtime: steal-time is only supported when configured > > KVM: arm64: pvtime: Fix potential loss of stolen time > > KVM: arm64: pvtime: Fix stolen time accounting across migration > > KVM: Documentation minor fixups > > arm64/x86: KVM: Introduce steal-time cap > > > > Documentation/virt/kvm/api.rst| 20 > > arch/arm64/include/asm/kvm_host.h | 2 +- > > arch/arm64/kvm/arm.c | 3 +++ > > arch/arm64/kvm/pvtime.c | 31 +++ > > arch/x86/kvm/x86.c| 3 +++ > > include/linux/kvm_host.h | 19 +++ > > include/uapi/linux/kvm.h | 1 + > > 7 files changed, 58 insertions(+), 21 deletions(-) > > Hi Andrew, > > Sorry about the time it took to get to this series. No problem. > Although I had a number of comments, they are all easy to > address, and you will hopefully be able to respin it quickly I'll address all the comments and get it respun right away. > (assuming we agree that patch #1 is unnecessary). I'm not sure yet. I've suggested yet another interpretation of the spec and will see what you say about that. Thanks, drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: pvtime: steal-time is only supported when configured
On Mon, Jul 27, 2020 at 06:25:50PM +0100, Marc Zyngier wrote: > Hi Andrew, > > On 2020-07-11 11:04, Andrew Jones wrote: > > Don't confuse the guest by saying steal-time is supported when > > it hasn't been configured by userspace and won't work. > > > > Signed-off-by: Andrew Jones > > --- > > arch/arm64/kvm/pvtime.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c > > index f7b52ce1557e..2b22214909be 100644 > > --- a/arch/arm64/kvm/pvtime.c > > +++ b/arch/arm64/kvm/pvtime.c > > @@ -42,9 +42,12 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu) > > > > switch (feature) { > > case ARM_SMCCC_HV_PV_TIME_FEATURES: > > - case ARM_SMCCC_HV_PV_TIME_ST: > > val = SMCCC_RET_SUCCESS; > > break; > > + case ARM_SMCCC_HV_PV_TIME_ST: > > + if (vcpu->arch.steal.base != GPA_INVALID) > > + val = SMCCC_RET_SUCCESS; > > + break; > > } > > > > return val; > > I'm not so sure about this. I have always considered the > discovery interface to be "do you know about this SMCCC > function". And if you look at the spec, it says (4.2, > PV_TIME_FEATURES): > > > If PV_call_id identifies PV_TIME_FEATURES, this call returns: > • NOT_SUPPORTED (-1) to indicate that all > paravirtualized time functions in this specification are not > supported. > • SUCCESS (0) to indicate that all the paravirtualized time > functions in this specification are supported. > > > So the way I understand it, you cannot return "supported" > for PV_TIME_FEATURES, and yet return NOT_SUPPORTED for > PV_TIME_ST. It applies to *all* features. > > Yes, this is very bizarre. But I don't think we can deviate > from it. Ah, I see your point. But I wonder if we should drop this patch or if we should change the return of ARM_SMCCC_HV_PV_TIME_FEATURES to be dependant on all the pv calls? Discovery would look like this IF (SMCCC_ARCH_FEATURES, PV_TIME_FEATURES) == 0; THEN IF (PV_TIME_FEATURES, PV_TIME_FEATURES) == 0; THEN PV_TIME_ST is supported, as well as all other PV calls ELIF (PV_TIME_FEATURES, PV_TIME_ST) == 0; THEN PV_TIME_ST is supported ELIF (PV_TIME_FEATURES, ) == 0; THEN is supported ... ENDIF ELSE No PV calls are supported ENDIF I believe the above implements a reasonable interpretation of the specification, but the all feature (PV_TIME_FEATURES, PV_TIME_FEATURES) thing is indeed bizarre no matter how you look at it. Thanks, drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH kvm-unit-tests] arm64: Compile with -mno-outline-atomics
GCC 10.1.0 introduced the -moutline-atomics option which, when enabled, use LSE instructions when the processor provides them. The option is enabled by default and unfortunately causes the following error at compile time: aarch64-linux-gnu-ld: /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a(lse-init.o): in function `init_have_lse_atomics': lse-init.c:(.text.startup+0xc): undefined reference to `__getauxval' This is happening because we are linking against our own libcflat which doesn't implement the function __getauxval(). Disable the use of the out-of-line functions by compiling with -mno-outline-atomics. Reported-by: Alexandru Elisei Tested-by: Alexandru Elisei Signed-off-by: Andrew Jones --- Makefile | 11 +-- arm/Makefile.arm64 | 3 +++ 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 3ff2f91600f6..0e21a49096ba 100644 --- a/Makefile +++ b/Makefile @@ -17,6 +17,11 @@ DESTDIR := $(PREFIX)/share/kvm-unit-tests/ .PHONY: arch_clean clean distclean cscope +# cc-option +# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, -malign-functions=0) +cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \ + > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;) + #make sure env CFLAGS variable is not used CFLAGS = @@ -43,12 +48,6 @@ OBJDIRS += $(LIBFDT_objdir) #include architecture specific make rules include $(SRCDIR)/$(TEST_DIR)/Makefile -# cc-option -# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, -malign-functions=0) - -cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \ - > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;) - COMMON_CFLAGS += -g $(autodepend-flags) -fno-strict-aliasing -fno-common COMMON_CFLAGS += -Wall -Wwrite-strings -Wempty-body -Wuninitialized COMMON_CFLAGS += -Wignored-qualifiers -Werror diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64 index dfd0c56fe8fb..dbc7524d3070 100644 --- a/arm/Makefile.arm64 +++ b/arm/Makefile.arm64 @@ -9,6 +9,9 @@ ldarch = elf64-littleaarch64 arch_LDFLAGS = -pie -n CFLAGS += -mstrict-align +mno_outline_atomics := $(call cc-option, -mno-outline-atomics, "") +CFLAGS += $(mno_outline_atomics) + define arch_elf_check = $(if $(shell ! $(OBJDUMP) -R $(1) >&/dev/null && echo "nok"), $(error $(shell $(OBJDUMP) -R $(1) 2>&1))) -- 2.25.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RESEND PATCH] drivers: arm arch timer: Correct fault programming of CNTKCTL_EL1.EVNTI
Hi Marc, On 2020/7/28 18:16, Marc Zyngier wrote: > On 2020-07-17 10:21, Keqian Zhu wrote: >> ARM virtual counter supports event stream. It can only trigger an event >> when the trigger bit of CNTVCT_EL0 changes from 0 to 1 (or from 1 to 0), >> so the actual period of event stream is 2 ^ (cntkctl_evnti + 1). For >> example, when the trigger bit is 0, then it triggers an event for every >> two cycles. >> >> Signed-off-by: Keqian Zhu >> --- >> drivers/clocksource/arm_arch_timer.c | 17 ++--- >> 1 file changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/clocksource/arm_arch_timer.c >> b/drivers/clocksource/arm_arch_timer.c >> index ecf7b7db2d05..06d99a4b1b9b 100644 >> --- a/drivers/clocksource/arm_arch_timer.c >> +++ b/drivers/clocksource/arm_arch_timer.c >> @@ -799,10 +799,20 @@ static void __arch_timer_setup(unsigned type, >> static void arch_timer_evtstrm_enable(int divider) >> { >> u32 cntkctl = arch_timer_get_cntkctl(); >> +int cntkctl_evnti; >> + >> +/* >> + * Note that it can only trigger an event when the trigger bit >> + * of CNTVCT_EL0 changes from 1 to 0 (or from 0 to 1), so the >> + * actual period of event stream is 2 ^ (cntkctl_evnti + 1). >> + */ >> +cntkctl_evnti = divider - 1; >> +cntkctl_evnti = min(cntkctl_evnti, 15); >> +cntkctl_evnti = max(cntkctl_evnti, 0); >> >> cntkctl &= ~ARCH_TIMER_EVT_TRIGGER_MASK; >> /* Set the divider and enable virtual event stream */ >> -cntkctl |= (divider << ARCH_TIMER_EVT_TRIGGER_SHIFT) >> +cntkctl |= (cntkctl_evnti << ARCH_TIMER_EVT_TRIGGER_SHIFT) >> | ARCH_TIMER_VIRT_EVT_EN; >> arch_timer_set_cntkctl(cntkctl); >> arch_timer_set_evtstrm_feature(); >> @@ -816,10 +826,11 @@ static void arch_timer_configure_evtstream(void) >> /* Find the closest power of two to the divisor */ >> evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ; >> pos = fls(evt_stream_div); >> -if (pos > 1 && !(evt_stream_div & (1 << (pos - 2 >> +if ((pos == 1) || (pos > 1 && !(evt_stream_div & (1 << (pos - 2) >> pos--; >> + >> /* enable event stream */ >> -arch_timer_evtstrm_enable(min(pos, 15)); >> +arch_timer_evtstrm_enable(pos); >> } >> >> static void arch_counter_set_user_access(void) > > This looks like a very convoluted fix. If the problem you are > trying to fix is that the event frequency is at most half of > that of the counter, why isn't the patch below the most > obvious fix? > > Thanks, > > M. > > diff --git a/drivers/clocksource/arm_arch_timer.c > b/drivers/clocksource/arm_arch_timer.c > index 6c3e84180146..0a65414b781f 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -824,8 +824,12 @@ static void arch_timer_configure_evtstream(void) > { > int evt_stream_div, pos; > > -/* Find the closest power of two to the divisor */ > -evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ; > +/* > + * Find the closest power of two to the divisor. As the event > + * stream can at most be generated at half the frequency of the > + * counter, use half the frequency when computing the divider. > + */ > +evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ / 2; > pos = fls(evt_stream_div); > if (pos > 1 && !(evt_stream_div & (1 << (pos - 2 I think here does not consider the case of pos==1 (though it will not occur...) > pos--; > It looks good to me. Thanks, Keqian ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Unify non-VHE ASLR features behind CONFIG_RANDOMIZE_BASE
On Tue, 21 Jul 2020 10:44:43 +0100, David Brazdil wrote: > There is currently no way to disable nVHE ASLR, e.g. for debugging, so the > first patch in this series makes it conditional on RANDOMIZE_BASE, same as > KASLR. Note that the 'nokaslr' command line flag has no effect here. > > Second patch unifies the HARDEN_EL2_VECTORS errate for A57 and A72 behind > the same Kconfig for simplicity. Happy to make it just depend on > RANDOMIZE_BASE if having an option to keep randomization on but hardenning > off is preferred. > > [...] Applied to kvm-arm64/misc-5.9, thanks! [1/2] KVM: arm64: Make nVHE ASLR conditional on RANDOMIZE_BASE commit: 24f69c0fa4e252f706884114b7d6353aa07678b5 [2/2] KVM: arm64: Substitute RANDOMIZE_BASE for HARDEN_EL2_VECTORS commit: a59a2edbbba7397fede86e40a3da17e5beebf98b Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RESEND PATCH] drivers: arm arch timer: Correct fault programming of CNTKCTL_EL1.EVNTI
On 2020-07-17 10:21, Keqian Zhu wrote: ARM virtual counter supports event stream. It can only trigger an event when the trigger bit of CNTVCT_EL0 changes from 0 to 1 (or from 1 to 0), so the actual period of event stream is 2 ^ (cntkctl_evnti + 1). For example, when the trigger bit is 0, then it triggers an event for every two cycles. Signed-off-by: Keqian Zhu --- drivers/clocksource/arm_arch_timer.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index ecf7b7db2d05..06d99a4b1b9b 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -799,10 +799,20 @@ static void __arch_timer_setup(unsigned type, static void arch_timer_evtstrm_enable(int divider) { u32 cntkctl = arch_timer_get_cntkctl(); + int cntkctl_evnti; + + /* +* Note that it can only trigger an event when the trigger bit +* of CNTVCT_EL0 changes from 1 to 0 (or from 0 to 1), so the +* actual period of event stream is 2 ^ (cntkctl_evnti + 1). +*/ + cntkctl_evnti = divider - 1; + cntkctl_evnti = min(cntkctl_evnti, 15); + cntkctl_evnti = max(cntkctl_evnti, 0); cntkctl &= ~ARCH_TIMER_EVT_TRIGGER_MASK; /* Set the divider and enable virtual event stream */ - cntkctl |= (divider << ARCH_TIMER_EVT_TRIGGER_SHIFT) + cntkctl |= (cntkctl_evnti << ARCH_TIMER_EVT_TRIGGER_SHIFT) | ARCH_TIMER_VIRT_EVT_EN; arch_timer_set_cntkctl(cntkctl); arch_timer_set_evtstrm_feature(); @@ -816,10 +826,11 @@ static void arch_timer_configure_evtstream(void) /* Find the closest power of two to the divisor */ evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ; pos = fls(evt_stream_div); - if (pos > 1 && !(evt_stream_div & (1 << (pos - 2 + if ((pos == 1) || (pos > 1 && !(evt_stream_div & (1 << (pos - 2) pos--; + /* enable event stream */ - arch_timer_evtstrm_enable(min(pos, 15)); + arch_timer_evtstrm_enable(pos); } static void arch_counter_set_user_access(void) This looks like a very convoluted fix. If the problem you are trying to fix is that the event frequency is at most half of that of the counter, why isn't the patch below the most obvious fix? Thanks, M. diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 6c3e84180146..0a65414b781f 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -824,8 +824,12 @@ static void arch_timer_configure_evtstream(void) { int evt_stream_div, pos; - /* Find the closest power of two to the divisor */ - evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ; + /* +* Find the closest power of two to the divisor. As the event +* stream can at most be generated at half the frequency of the +* counter, use half the frequency when computing the divider. +*/ + evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ / 2; pos = fls(evt_stream_div); if (pos > 1 && !(evt_stream_div & (1 << (pos - 2 pos--; -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: arm64: Don't inherit exec permission across page-table levels
On Thu, 23 Jul 2020 11:17:14 +0100, Will Deacon wrote: > If a stage-2 page-table contains an executable, read-only mapping at the > pte level (e.g. due to dirty logging being enabled), a subsequent write > fault to the same page which tries to install a larger block mapping > (e.g. due to dirty logging having been disabled) will erroneously inherit > the exec permission and consequently skip I-cache invalidation for the > rest of the block. > > [...] Applied to kvm-arm64/fixes-5.8-3, thanks! [1/1] KVM: arm64: Don't inherit exec permission across page-table levels commit: b757b47a2fcba584d4a32fd7ee68faca510ab96f Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM
Hi Marc, On 2020/7/28 15:52, Marc Zyngier wrote: > On 2020-07-28 03:11, zhukeqian wrote: >> Hi Marc, > > [...] > >>> But you are still reading the leaf entries of the PTs, hence defeating >>> any sort of prefetch that the CPU could do for you. And my claim is >>> that reading the bitmap is much faster than parsing the PTs. Are you >>> saying that this isn't the case? >> I am confused here. MMU DBM just updates the S2AP[1] of PTs, so dirty >> information >> is not continuous. The smallest granularity of read instruction is one >> byte, we must >> read one byte of each PTE to determine whether it is dirty. So I think >> the smallest >> reading amount is 512 bytes per 2MB. > > Which is why using DBM as a way to implement dirty-logging doesn't work. > Forcing the vcpu to take faults in order to update the dirty bitmap > has the benefit of (a) telling you exactly what page has been written to, > (b) *slowing the vcpu down*. > > See? no additional read, better convergence ratio because you are not > trying to catch up with a vcpu running wild. You are in control of the > dirtying rate, not the vcpu, and the information you get requires no > extra work (just set the corresponding bit in the dirty bitmap). OK, in fact I have considered some of these things before. You are right, DBM dirty logging is not suitable for guest with high dirty rate which even causes Qemu throttling. It only reduce side-effect of migration on guest with low dirty rate. I am not meaning to push this defective patch now, instead I am trying to find a software approach to solve hardware drawback. However, currently we do not have a perfect approach. > > Honestly, I think you are looking at the problem the wrong way. > DBM at S2 is not a solution to anything, because the information is > way too sparse, and it doesn't solve the real problem, which is > the tracking of dirty pages caused by devices. > > As I said twice before, I will not consider these patches without > a solution to the DMA problem, and I don't think DBM is part of > that solution. For that ARM SMMU HTTU do not have PML like feature, so the behavior of dirty log sync will be very similar with which of the MMU DBM. My original idea is that we can support MMU DBM firstly and then support SMMU HTTU based on MMU DBM. Sure, we can leave this patch here and hope we can pick it up at future if hardware is ready :-) . Many thanks for your review ;-) . Thanks, Keqian ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[GIT PULL] KVM/arm64 fixes for 5.8, take #4
Hi Paolo, This is the last batch of fixes for 5.8. One fixes a long standing MMU issue, while the other addresses a more recent brekage with out-of-line helpers in the nVHE code. Please pull, M. The following changes since commit b9e10d4a6c9f5cbe6369ce2c17ebc67d2e5a4be5: KVM: arm64: Stop clobbering x0 for HVC_SOFT_RESTART (2020-07-06 11:47:02 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-5.8-4 for you to fetch changes up to b757b47a2fcba584d4a32fd7ee68faca510ab96f: KVM: arm64: Don't inherit exec permission across page-table levels (2020-07-28 09:03:57 +0100) KVM/arm64 fixes for Linux 5.8, take #3 - Fix a corner case of a new mapping inheriting exec permission without and yet bypassing invalidation of the I-cache - Make sure PtrAuth predicates oinly generate inline code for the non-VHE hypervisor code Marc Zyngier (1): KVM: arm64: Prevent vcpu_has_ptrauth from generating OOL functions Will Deacon (1): KVM: arm64: Don't inherit exec permission across page-table levels arch/arm64/include/asm/kvm_host.h | 11 --- arch/arm64/kvm/mmu.c | 11 ++- 2 files changed, 14 insertions(+), 8 deletions(-) ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/2] KVM: arm64: Don't inherit exec permission across page-table levels
From: Will Deacon If a stage-2 page-table contains an executable, read-only mapping at the pte level (e.g. due to dirty logging being enabled), a subsequent write fault to the same page which tries to install a larger block mapping (e.g. due to dirty logging having been disabled) will erroneously inherit the exec permission and consequently skip I-cache invalidation for the rest of the block. Ensure that exec permission is only inherited by write faults when the new mapping is of the same size as the existing one. A subsequent instruction abort will result in I-cache invalidation for the entire block mapping. Signed-off-by: Will Deacon Signed-off-by: Marc Zyngier Tested-by: Quentin Perret Reviewed-by: Quentin Perret Cc: Marc Zyngier Cc: Link: https://lore.kernel.org/r/20200723101714.15873-1-w...@kernel.org --- arch/arm64/kvm/mmu.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8c0035cab6b6..31058e6e7c2a 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1326,7 +1326,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, return true; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr, unsigned long sz) { pud_t *pudp; pmd_t *pmdp; @@ -1338,11 +1338,11 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) return false; if (pudp) - return kvm_s2pud_exec(pudp); + return sz <= PUD_SIZE && kvm_s2pud_exec(pudp); else if (pmdp) - return kvm_s2pmd_exec(pmdp); + return sz <= PMD_SIZE && kvm_s2pmd_exec(pmdp); else - return kvm_s2pte_exec(ptep); + return sz == PAGE_SIZE && kvm_s2pte_exec(ptep); } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, @@ -1958,7 +1958,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * execute permissions, and we preserve whatever we have. */ needs_exec = exec_fault || - (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + (fault_status == FSC_PERM && +stage2_is_exec(kvm, fault_ipa, vma_pagesize)); if (vma_pagesize == PUD_SIZE) { pud_t new_pud = kvm_pfn_pud(pfn, mem_type); -- 2.27.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/2] KVM: arm64: Prevent vcpu_has_ptrauth from generating OOL functions
So far, vcpu_has_ptrauth() is implemented in terms of system_supports_*_auth() calls, which are declared "inline". In some specific conditions (clang and SCS), the "inline" very much turns into an "out of line", which leads to a fireworks when this predicate is evaluated on a non-VHE system (right at the beginning of __hyp_handle_ptrauth). Instead, make sure vcpu_has_ptrauth gets expanded inline by directly using the cpus_have_final_cap() helpers, which are __always_inline, generate much better code, and are the only thing that make sense when running at EL2 on a nVHE system. Fixes: 29eb5a3c57f7 ("KVM: arm64: Handle PtrAuth traps early") Reported-by: Nathan Chancellor Reported-by: Nick Desaulniers Signed-off-by: Marc Zyngier Tested-by: Nathan Chancellor Reviewed-by: Nathan Chancellor Link: https://lore.kernel.org/r/20200722162231.3689767-1-...@kernel.org --- arch/arm64/include/asm/kvm_host.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index c3e6fcc664b1..e21d4a01372f 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -380,9 +380,14 @@ struct kvm_vcpu_arch { #define vcpu_has_sve(vcpu) (system_supports_sve() && \ ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE)) -#define vcpu_has_ptrauth(vcpu) ((system_supports_address_auth() || \ - system_supports_generic_auth()) && \ -((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_PTRAUTH)) +#ifdef CONFIG_ARM64_PTR_AUTH +#define vcpu_has_ptrauth(vcpu) \ + ((cpus_have_final_cap(ARM64_HAS_ADDRESS_AUTH) ||\ + cpus_have_final_cap(ARM64_HAS_GENERIC_AUTH)) && \ +(vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_PTRAUTH) +#else +#define vcpu_has_ptrauth(vcpu) false +#endif #define vcpu_gp_regs(v)(&(v)->arch.ctxt.gp_regs) -- 2.27.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM
On 2020-07-28 03:11, zhukeqian wrote: Hi Marc, [...] But you are still reading the leaf entries of the PTs, hence defeating any sort of prefetch that the CPU could do for you. And my claim is that reading the bitmap is much faster than parsing the PTs. Are you saying that this isn't the case? I am confused here. MMU DBM just updates the S2AP[1] of PTs, so dirty information is not continuous. The smallest granularity of read instruction is one byte, we must read one byte of each PTE to determine whether it is dirty. So I think the smallest reading amount is 512 bytes per 2MB. Which is why using DBM as a way to implement dirty-logging doesn't work. Forcing the vcpu to take faults in order to update the dirty bitmap has the benefit of (a) telling you exactly what page has been written to, (b) *slowing the vcpu down*. See? no additional read, better convergence ratio because you are not trying to catch up with a vcpu running wild. You are in control of the dirtying rate, not the vcpu, and the information you get requires no extra work (just set the corresponding bit in the dirty bitmap). Honestly, I think you are looking at the problem the wrong way. DBM at S2 is not a solution to anything, because the information is way too sparse, and it doesn't solve the real problem, which is the tracking of dirty pages caused by devices. As I said twice before, I will not consider these patches without a solution to the DMA problem, and I don't think DBM is part of that solution. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RESEND PATCH] drivers: arm arch timer: Correct fault programming of CNTKCTL_EL1.EVNTI
Friendly ping. Is this an effective bugfix? On 2020/7/17 17:21, Keqian Zhu wrote: > ARM virtual counter supports event stream. It can only trigger an event > when the trigger bit of CNTVCT_EL0 changes from 0 to 1 (or from 1 to 0), > so the actual period of event stream is 2 ^ (cntkctl_evnti + 1). For > example, when the trigger bit is 0, then it triggers an event for every > two cycles. > > Signed-off-by: Keqian Zhu > --- > drivers/clocksource/arm_arch_timer.c | 17 ++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/clocksource/arm_arch_timer.c > b/drivers/clocksource/arm_arch_timer.c > index ecf7b7db2d05..06d99a4b1b9b 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -799,10 +799,20 @@ static void __arch_timer_setup(unsigned type, > static void arch_timer_evtstrm_enable(int divider) > { > u32 cntkctl = arch_timer_get_cntkctl(); > + int cntkctl_evnti; > + > + /* > + * Note that it can only trigger an event when the trigger bit > + * of CNTVCT_EL0 changes from 1 to 0 (or from 0 to 1), so the > + * actual period of event stream is 2 ^ (cntkctl_evnti + 1). > + */ > + cntkctl_evnti = divider - 1; > + cntkctl_evnti = min(cntkctl_evnti, 15); > + cntkctl_evnti = max(cntkctl_evnti, 0); > > cntkctl &= ~ARCH_TIMER_EVT_TRIGGER_MASK; > /* Set the divider and enable virtual event stream */ > - cntkctl |= (divider << ARCH_TIMER_EVT_TRIGGER_SHIFT) > + cntkctl |= (cntkctl_evnti << ARCH_TIMER_EVT_TRIGGER_SHIFT) > | ARCH_TIMER_VIRT_EVT_EN; > arch_timer_set_cntkctl(cntkctl); > arch_timer_set_evtstrm_feature(); > @@ -816,10 +826,11 @@ static void arch_timer_configure_evtstream(void) > /* Find the closest power of two to the divisor */ > evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ; > pos = fls(evt_stream_div); > - if (pos > 1 && !(evt_stream_div & (1 << (pos - 2 > + if ((pos == 1) || (pos > 1 && !(evt_stream_div & (1 << (pos - 2) > pos--; > + > /* enable event stream */ > - arch_timer_evtstrm_enable(min(pos, 15)); > + arch_timer_evtstrm_enable(pos); > } > > static void arch_counter_set_user_access(void) > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm