Hi Drew, On 6/27/19 11:40 AM, Andrew Jones wrote: > On Wed, Jun 26, 2019 at 04:58:05PM +0200, Auger Eric wrote: >> Hi Drew, >> >> On 6/21/19 6:34 PM, Andrew Jones wrote: >>> Introduce cpu properties to give fine control over SVE vector lengths. >>> We introduce a property for each valid length up to the current >>> maximum supported, which is 2048-bits. The properties are named, e.g. >>> sve128, sve256, sve512, ..., where the number is the number of bits. >> sve384 then in the natural order, otherwise it gives the impression you >> can only specify * 128bit pow of 2 sizes at this stage of the reading. > > OK > >> >>> >>> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off >> Document the fact sve512 cannot be turned to off, which sounds fully >> sensible (by reading the code). By the way, I think an actual >> documentation should be provided in qemu. Maybe as spec to agree on. > > Actually, maybe I could just allow it to be disabled. It would be > a strange command line to do '-cpu max,sve-max-vq=4,sve512=off', but if > that's what the user wants, then there's not really any good reason to > block it. As I say below, mixing these types of properties on the command > line isn't really a good idea, but it's not completely blocked because we > want a user that prefers launching their (most likely TCG) guest with, > e.g. '-cpu max,sve-max-vq=4', to also have a functional QMP CPU model > expansion, but the CPU model expansion for SVE vector lengths depends on > the sve<vl-bits> properties, thus mixing sve-max-vq with sve<vl-bits>, > where sve-max-vq comes first. They're just not mixed on the command line. OK > >>> to provide a guest vector lengths 128, 256, and 512 bits. The resulting >>> set must conform to the architectural constraint of having all power-of-2 >>> lengths smaller than the maximum length present. It's also possible to >>> only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on> That example >>> provides the machine with 128, 256, and 512 bit vector >> lengths. >>> It doesn't hurt to explicitly ask for all expected vector lengths, >>> which is what, for example, libvirt should do.> >>> Note1, it is not possible to use sve<vl-bits> properties before >>> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting >>> that overly complicates the user input validation. >>> >>> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the >>> same as -cpu max,sve512=on, they are not. >> yep it is a bit weird >> >> Didn't you consider -cpu max,sve-max-vq=4,req_only=true removing non >> power of 2 values and sve<vl-bits> setting a single VLbit? > > Nope. I mostly considered making sure sve-max-vq was supported to the > level necessary to work with the new properties, and to not change its > current semantics, but I don't want to extend it in any way, nor to > require it to be used with the new properties at all. sve-max-vq isn't > even going to be supported by '-cpu host', so we can't rely on it being > part of the SVE vector length selection for normal KVM guests. OK, as long as it is documented somewhere. > > Keep in mind that the current semantics of sve-max-vq are to enable all > vector lengths up to and including that specified maximum. That's not > a safe way to select vector lengths on KVM, as it may include vector > lengths that the KVM host does not support, thus it's not something KVM > users should use. It's already there for the 'max' cpu type though, so > we work with it in this series the best we can for 'max', but not at all > for 'host'. Do you expect hosts to expose only a few of the permitted values? I imagined it would generally expose none or all of them actually. If you discourage people to use sve-max-vq and sve<>=on only sets 2^n values, userspace will need to set manually all intermediate !2^n values > > Re: documentation for this. I suppose I could write something up that > clarifies the properties and also suggests best practices regarding > sve-max-vq.
To me this is really helpful to have a global understanding > >>> The former enables all vector lengths 512 bits and smaller >> ( required and permitted) >>> while the latter only sets the 512-bit >>> length and its smaller power-of-2 lengths. It's probably best not to use >>> sve-max-vq with sve<vl-bits> properties, but it can't be completely >>> forbidden as we want qmp_query_cpu_model_expansion to work with guests >>> launched with e.g. -cpu max,sve-max-vq=8 on their command line. >> what does happen if you specify -cpu max,sve384=on? (seems to be allowed >> in the code?) > > That's perfectly valid with tcg, you'll get 128,256,384. It's also valid > with KVM if you're host supports it. OK so it exposes the maximum configurable vector length + all the corresponding required values. > >> >>> >>> Signed-off-by: Andrew Jones <drjo...@redhat.com> >>> --- >>> target/arm/cpu.c | 6 + >>> target/arm/cpu.h | 14 ++ >>> target/arm/cpu64.c | 360 ++++++++++++++++++++++++++++++++++++++- >>> target/arm/helper.c | 11 +- >>> target/arm/monitor.c | 16 ++ >>> tests/arm-cpu-features.c | 217 +++++++++++++++++++++++ >>> 6 files changed, 620 insertions(+), 4 deletions(-) >>> >>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c >>> index f08e178fc84b..e060a0d9df0e 100644 >>> --- a/target/arm/cpu.c >>> +++ b/target/arm/cpu.c >>> @@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, >>> Error **errp) >>> return; >>> } >>> >>> + arm_cpu_sve_finalize(cpu, &local_err); >>> + if (local_err) { >>> + error_propagate(errp, local_err); >>> + return; >>> + } >>> + >>> if (arm_feature(env, ARM_FEATURE_AARCH64) && >>> cpu->has_vfp != cpu->has_neon) { >>> /* >>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h >>> index f9da672be575..cbb155cf72a5 100644 >>> --- a/target/arm/cpu.h >>> +++ b/target/arm/cpu.h >>> @@ -184,8 +184,13 @@ typedef struct { >>> >>> #ifdef TARGET_AARCH64 >>> # define ARM_MAX_VQ 16 >>> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp); >>> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq); >>> #else >>> # define ARM_MAX_VQ 1 >>> +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { } >>> +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t >>> vq) >>> +{ return 0; } >>> #endif >>> >>> typedef struct ARMVectorReg { >>> @@ -915,6 +920,15 @@ struct ARMCPU { >>> >>> /* Used to set the maximum vector length the cpu will support. */ >>> uint32_t sve_max_vq; >>> + >>> + /* >>> + * In sve_vq_map each set bit is a supported vector length of >>> + * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the >>> vector> + * length in quadwords. We need a map size twice the maximum >>> + * quadword length though because we use two bits for each vector >>> + * length in order to track three states: uninitialized, off, and on. >>> + */ >>> + DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2); >>> }; >>> >>> void arm_cpu_post_init(Object *obj); >>> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c >>> index 02ada65f240c..5def82111dee 100644 >>> --- a/target/arm/cpu64.c >>> +++ b/target/arm/cpu64.c >>> @@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj) >>> define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo); >>> } >>> >>> +/* >>> + * While we eventually use cpu->sve_vq_map as a typical bitmap, where each >>> vq >>> + * has only two states (off/on), until we've finalized the map at realize >>> time >>> + * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also >>> allow >>> + * tracking of the uninitialized state. The arm_vq_state typedef and >>> following >>> + * functions allow us to more easily work with the bitmap. Also, while the >>> map >>> + * is still initializing, sve-max-vq has an additional three states, >>> bringing >>> + * the number of its states to five, which are the following: >>> + * >>> + * sve-max-vq: >>> + * 0: SVE is disabled. The default value for a vq in the map is 'OFF'. >>> + * -1: SVE is enabled, but neither sve-max-vq nor sve<vl-bits> >>> properties >>> + * have yet been specified by the user. The default value for a vq >>> in >>> + * the map is 'ON'. >>> + * -2: SVE is enabled and one or more sve<vl-bits> properties have been >>> + * set to 'OFF' by the user, but no sve<vl-bits> properties have >>> yet >>> + * been set to 'ON'. The user is now blocked from setting >>> sve-max-vq >>> + * and the default value for a vq in the map is 'ON'. >>> + * -3: SVE is enabled and one or more sve<vl-bits> properties have been >>> + * set to 'ON' by the user. The user is blocked from setting >>> sve-max-vq >>> + * and the default value for a vq in the map is 'OFF'. sve-max-vq >>> never >>> + * transitions back to -2, even if later inputs disable the vector >>> + * lengths that initially transitioned sve-max-vq to this state. >>> This >>> + * avoids the default values from flip-flopping. >>> + * [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid >>> + * sve-max-vq. The sve-max-vq specified vq and all smaller >>> + * vq's will be initially enabled. All larger vq's will >>> have >>> + * a default of 'OFF'. >>> + */ >>> +#define ARM_SVE_INIT -1 >>> +#define ARM_VQ_DEFAULT_ON -2 >>> +#define ARM_VQ_DEFAULT_OFF -3 >>> + >>> +#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0) >>> + >>> +typedef enum arm_vq_state { >>> + ARM_VQ_OFF, >>> + ARM_VQ_ON, >>> + ARM_VQ_UNINITIALIZED, >>> +} arm_vq_state; >>> + >>> +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)> +{ >>> + assert(vq <= ARM_MAX_VQ); >>> + >>> + return test_bit(vq - 1, cpu->sve_vq_map) | >>> + test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1; >>> +}> + >>> +static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state) >>> +{ >>> + assert(state == ARM_VQ_OFF || state == ARM_VQ_ON); >>> + assert(vq <= ARM_MAX_VQ); >>> + >>> + clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map); >>> + >>> + if (state == ARM_VQ_ON) { >>> + set_bit(vq - 1, cpu->sve_vq_map); >>> + } else { >>> + clear_bit(vq - 1, cpu->sve_vq_map); >>> + } >>> +} >>> + >>> +static void arm_cpu_vq_map_init(ARMCPU *cpu) >>> +{ >>> + bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2); >> nit: bitmap_clear(map, 0, ARM_MAX_VQ); > > bitmap_clear will only clear the specified bits, leaving the upper bits > uninitialized, which could cause problems. It's not a problem here > because we're using a zero initialized cpu state member, but if it was > a bitmap like below, declared on the stack, then we need the zeroing at > least once. I'd prefer to use zero here too, to keep it consistent. Sorry I don't get it. I would have expected the above bitmap_clear would 0 initialize 0 - ARM_MAX_VQ-1 bits and the bitmap_set below would initialize ARM_MAX_VQ - 2 * ARM_MAX_VQ -1 with 1's? > >> /* all VLs OFF */ >>> + bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ); >> /* all VLs uninitialized */ > > I can add one comment that says > > "Set all VQ's to ARM_VQ_UNINITIALIZED" above both lines. > >>> +} >>> + >>> +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu) >>> +{ >>> + DECLARE_BITMAP(map, ARM_MAX_VQ * 2); >>> + >>> + bitmap_zero(map, ARM_MAX_VQ * 2); >> same > > Here me must use bitmap_zero. > >>> + bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ); >>> + bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2); >>> + >>> + return bitmap_empty(map, ARM_MAX_VQ * 2); >>> +} >>> + >>> +static void arm_cpu_vq_map_finalize(ARMCPU *cpu) >>> +{ >>> + Error *err = NULL; >>> + char name[8]; >>> + uint32_t vq; >>> + bool value; >>> + >>> + /* >>> + * We use the property get accessor because it knows what default >>> + * values to return for uninitialized vector lengths. >>> + */ >>> + for (vq = 1; vq <= ARM_MAX_VQ; ++vq) { >>> + sprintf(name, "sve%d", vq * 128); >>> + value = object_property_get_bool(OBJECT(cpu), name, &err); >>> + assert(!err); >>> + if (value) { >>> + arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON); >>> + } else { >>> + arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF); >>> + } >>> + } >>> + >>> + assert(arm_cpu_vq_map_is_finalized(cpu)); >> this can be removed > > Yes, it can be today, as the code works fine. The idea was that if > somebody came in here and modified this in someway that broke the > finalized state, then we'd catch it here before going off and doing > weird things. This isn't a hot path, so I'd prefer we keep it, but > if QEMU maintainers prefer to limit defensive code, then I can > certainly remove it. Well I understand this was useful was developing and to check some tricky states but some of the asserts correspond to some checks that look rather obvious, like this one. But well I let others give their opinions and it is not a big deal either. Then we can wonder when one needs a trace mesg before asserting ... > >>> +} >>> + >>> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) >>> +{ >>> + Error *err = NULL; >>> + >>> + if (!cpu->sve_max_vq) { >>> + bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2); >>> + return; >>> + } >>> + >>> + if (cpu->sve_max_vq == ARM_SVE_INIT) { >>> + object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", >>> &err); >>> + if (err) { >>> + error_propagate(errp, err); >>> + return; >>> + } >>> + assert(cpu->sve_max_vq == ARM_MAX_VQ); >> I guess those asserts can be removed now? > > I'd prefer to keep it. It's critical that we get this right, so if > somebody changes something somewhere in the property set/get code > that would break this, then it's best we know right away. Again, > though, if there's some reason to limit defensive code, even on the > init path, then I can. Here I would expect that if the set did not fail, sve-max-vq effectively if set to the expected value. But maybe I don't remember the cases where sve_max-vq can be set to some other value without reprting an error. > >>> + arm_cpu_vq_map_finalize(cpu); >> move the arm_cpu_vq_map_finalize out of the if, at the end. > > That wouldn't work. In this branch arm_cpu_vq_map_finalize() comes > after setting cpu->sve_max_vq. In the else branch it must come first. That's right sorry > >>> + } else { >>> + arm_cpu_vq_map_finalize(cpu); >>> + if (!arm_sve_have_max_vq(cpu)) { >>> + cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ >>> + 1); >>> + } >>> + } >>> + >>> + assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, >>> ARM_MAX_VQ));> > What happened to my '+ 1' here? I see it's in the patch, but somehow got > removed in your quote of the patch? For a second there I thought something > was really wrong, since that assert should have fired for every full map. Yep sorry for the cut :-( > >> same here > > Anyway my same argument that we don't want to leave arm_cpu_sve_finalize() > without knowing we got this right applies. > >>> +} >>> + >>> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq) >>> +{ >>> + uint32_t bitnum; >>> + >>> + assert(vq <= ARM_MAX_VQ + 1); >>> + assert(arm_cpu_vq_map_is_finalized(cpu)); >>> + >>> + bitnum = find_last_bit(cpu->sve_vq_map, vq - 1); >> why do you pass ARM_MAX_VQ + 1 and then do vq -1? doesn't >> find_last_bit() take the size which is ARM_MAX_VQ in this case? > > The size is ARM_MAX_VQ + 1, when you want to also check the bitnum > corresponding to ARM_MAX_VQ. The bitnum for a VQ is always VQ - 1. > Recall VQ=1, which is 128-bits. It takes the bit position zero. > VQ=ARM_MAX_VQ=16 takes the bit position 15. As for the 'vq - 1' here > in the find_last_bit() call, that's required because that's what the > function says it does. It finds the next smaller VQ. So if you pass > in, for example, vq=2, it shouldn't search anything but bit position > zero: > > vq=2 => max next smaller is vq=1, bitnum = 0 > => search the bitmap of size 1 for the last set bit > > Or, if you want to search the whole map, including the maximum > possibly VQ, as is done above, then you need to pass ARM_MAX_VQ + 1, > since the max next smaller VQ is ARM_MAX_VQ. OK I get it now. Maybe renaming the function into something like arm_cpu_vq_map_last_bit() would have avoided that. the first assert looks strange typically. > >>> + return bitnum == vq - 1 ? 0 : bitnum + 1; >>> +} >>> + >>> static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char >>> *name, >>> void *opaque, Error **errp) >>> { >>> @@ -283,12 +426,203 @@ static void cpu_max_set_sve_max_vq(Object *obj, >>> Visitor *v, const char *name, >>> return; >>> } >>> >>> + /* >>> + * It gets complicated trying to support both sve-max-vq and >>> + * sve<vl-bits> properties together, so we mostly don't. We >>> + * do allow both if sve-max-vq is specified first and only once >>> + * though. >>> + */ >>> + if (cpu->sve_max_vq != ARM_SVE_INIT) { >>> + error_setg(errp, "sve<vl-bits> in use or sve-max-vq already " >>> + "specified"); >>> + error_append_hint(errp, "sve-max-vq must come before all " >>> + "sve<vl-bits> properties and it must only " >>> + "be specified once.\n"); >>> + return; >>> + } >>> + >>> cpu->sve_max_vq = value; >>> >>> if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) { >>> error_setg(errp, "unsupported SVE vector length"); >>> error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n", >>> ARM_MAX_VQ); >>> + } else { >>> + uint32_t vq; >>> + >>> + for (vq = 1; vq <= cpu->sve_max_vq; ++vq) { >>> + char name[8]; >>> + sprintf(name, "sve%d", vq * 128); >>> + object_property_set_bool(obj, true, name, &err); >>> + if (err) { >>> + error_propagate(errp, err); >>> + return; >>> + } >>> + } >>> + } >>> +} >>> + >>> +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name, >>> + void *opaque, Error **errp) >>> +{ >>> + ARMCPU *cpu = ARM_CPU(obj); >>> + int vq = atoi(&name[3]) / 128; >>> + arm_vq_state vq_state; >>> + bool value; >>> + >>> + vq_state = arm_cpu_vq_map_get(cpu, vq); >>> + >>> + if (!cpu->sve_max_vq) { >>> + /* All vector lengths are disabled when SVE is off. */ >>> + value = false; >>> + } else if (vq_state == ARM_VQ_ON) { >>> + value = true; >>> + } else if (vq_state == ARM_VQ_OFF) { >>> + value = false; >>> + } else { >>> + /* >>> + * vq is uninitialized. We pick a default here based on the >>> + * the state of sve-max-vq and other sve<vl-bits> properties. >>> + */ >>> + if (arm_sve_have_max_vq(cpu)) { >>> + /* >>> + * If we have sve-max-vq, then all remaining uninitialized >>> + * vq's are 'OFF'. >>> + */ >>> + value = false; >>> + } else { >>> + switch (cpu->sve_max_vq) { >>> + case ARM_SVE_INIT: >>> + case ARM_VQ_DEFAULT_ON: >>> + value = true; >>> + break; >>> + case ARM_VQ_DEFAULT_OFF: >>> + value = false; >>> + break; >>> + } >>> + } >>> + } >>> + >>> + visit_type_bool(v, name, &value, errp); >>> +} >>> + >>> +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name, >>> + void *opaque, Error **errp) >>> +{ >>> + ARMCPU *cpu = ARM_CPU(obj); >>> + int vq = atoi(&name[3]) / 128; >>> + arm_vq_state vq_state; >>> + Error *err = NULL; >>> + uint32_t max_vq = 0; >>> + bool value; >>> + >>> + visit_type_bool(v, name, &value, &err); >>> + if (err) { >>> + error_propagate(errp, err); >>> + return; >>> + } >>> + >>> + if (value && !cpu->sve_max_vq) { >>> + error_setg(errp, "cannot enable %s", name); >>> + error_append_hint(errp, "SVE has been disabled with sve=off\n"); >>> + return; >>> + } else if (!cpu->sve_max_vq) { >>> + /* >>> + * We don't complain about disabling vector lengths when SVE >>> + * is off, but we don't do anything either. >>> + */ >>> + return; >>> + } >>> + >>> + if (arm_sve_have_max_vq(cpu)) { >>> + max_vq = cpu->sve_max_vq; >>> + } else { >>> + if (value) { >>> + cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF; >>> + } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) { >>> + cpu->sve_max_vq = ARM_VQ_DEFAULT_ON; >>> + } >>> + } >>> + >>> + /* >>> + * We need to know the maximum vector length, which may just currently >>> + * be the maximum length, in order to validate the enabling/disabling >>> + * of this vector length. We use the property get accessor in order to >>> + * get the appropriate default value for any uninitialized lengths. >>> + */ >>> + if (!max_vq) { >>> + char tmp[8]; >>> + bool s; >>> + >>> + for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) { >>> + sprintf(tmp, "sve%d", max_vq * 128); >>> + s = object_property_get_bool(OBJECT(cpu), tmp, &err); >>> + assert(!err); >>> + if (s) { >>> + break; >>> + } >>> + } >>> + } >>> + >>> + if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) { >>> + error_setg(errp, "cannot enable %s", name); >>> + error_append_hint(errp, "vq=%d (%d bits) is larger than the " >>> + "maximum vector length, sve-max-vq=%d " >>> + "(%d bits)\n", vq, vq * 128, >>> + cpu->sve_max_vq, cpu->sve_max_vq * 128); >>> + } else if (arm_sve_have_max_vq(cpu) && !value && vq == >>> cpu->sve_max_vq) { >>> + error_setg(errp, "cannot disable %s", name); >>> + error_append_hint(errp, "The maximum vector length must be " >>> + "enabled, sve-max-vq=%d (%d bits)\n", >>> + cpu->sve_max_vq, cpu->sve_max_vq * 128); >>> + } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq >>> && >>> + is_power_of_2(vq)) { >>> + error_setg(errp, "cannot disable %s", name); >>> + error_append_hint(errp, "vq=%d (%d bits) is required as it is a " >>> + "power-of-2 length smaller than the maximum, " >>> + "sve-max-vq=%d (%d bits)\n", vq, vq * 128, >>> + cpu->sve_max_vq, cpu->sve_max_vq * 128); >>> + } else if (!value && vq < max_vq && is_power_of_2(vq)) { >>> + error_setg(errp, "cannot disable %s", name); >>> + error_append_hint(errp, "Vector length %d-bits is required as it " >>> + "is a power-of-2 length smaller than another " >>> + "enabled vector length. Disable all larger >>> vector " >>> + "lengths first.\n", vq * 128); >>> + } else { >> adding return in each if/elsif would allow to avoid this indent. > > Yeah, but I didn't really mind the indent :-) > >>> + if (value) { >>> + bool fail = false; >>> + uint32_t s; >>> + >>> + /* >>> + * Enabling a vector length automatically enables all >>> + * uninitialized power-of-2 lengths smaller than it, as >>> + * per the architecture. >>> + */ >> Test we are not attempting to enable a !is_power_of_2 > > I'm not sure what this means. In this context 'vq' can be a !power-of-2 > if it wants to be, and there's no way for 's' to be a !power-of-2 because > we filter the vq's with is_power_of_2(s). Yep got it now > >>> + for (s = 1; s < vq; ++s) { >>> + if (is_power_of_2(s)) { >>> + vq_state = arm_cpu_vq_map_get(cpu, s); >>> + if (vq_state == ARM_VQ_UNINITIALIZED) { >>> + arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON); >>> + } else if (vq_state == ARM_VQ_OFF) { >>> + fail = true; >>> + break; >>> + } >>> + } >>> + } >>> + >>> + if (fail) { >>> + error_setg(errp, "cannot enable %s", name); >>> + error_append_hint(errp, "Vector length %d-bits is disabled >>> " >>> + "and is a power-of-2 length smaller than >>> " >>> + "%s. All power-of-2 vector lengths >>> smaller " >>> + "than the maximum length are >>> required.\n", >>> + s * 128, name); >>> + } else { >>> + arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON); >>> + } >>> + } else { >>> + arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF); >>> + } >>> } >>> } >>> >>> @@ -318,10 +652,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, >>> const char *name, >>> /* >>> * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing, >>> * but otherwise we don't do anything as an sve=on could come after >>> - * a sve-max-vq setting. >>> + * a sve-max-vq or sve<vl-bits> setting. >>> */ >>> if (!cpu->sve_max_vq) { >>> - cpu->sve_max_vq = ARM_MAX_VQ; >>> + cpu->sve_max_vq = ARM_SVE_INIT; >>> + arm_cpu_vq_map_init(cpu); >>> } >>> } else { >>> cpu->sve_max_vq = 0; >>> @@ -336,6 +671,7 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, >>> const char *name, >>> static void aarch64_max_initfn(Object *obj) >>> { >>> ARMCPU *cpu = ARM_CPU(obj); >>> + uint32_t vq; >>> >>> if (kvm_enabled()) { >>> kvm_arm_set_cpu_features_from_host(cpu); >>> @@ -420,11 +756,29 @@ static void aarch64_max_initfn(Object *obj) >>> cpu->dcz_blocksize = 7; /* 512 bytes */ >>> #endif >>> >>> - cpu->sve_max_vq = ARM_MAX_VQ; >>> + /* >>> + * sve_max_vq is initially unspecified, but must be initialized to >>> a >>> + * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has >>> + * SVE. It will be finalized in arm_cpu_realizefn(). >>> + */ >>> + cpu->sve_max_vq = ARM_SVE_INIT; >>> object_property_add(obj, "sve-max-vq", "uint32", >>> cpu_max_get_sve_max_vq, >>> cpu_max_set_sve_max_vq, NULL, NULL, >>> &error_fatal); >>> object_property_add(obj, "sve", "bool", cpu_arm_get_sve, >>> cpu_arm_set_sve, NULL, NULL, &error_fatal); >>> + >>> + /* >>> + * sve_vq_map uses a special state while setting properties, so >>> + * we initialize it here with its init function and finalize it >>> + * in arm_cpu_realizefn(). >>> + */ >>> + arm_cpu_vq_map_init(cpu); >>> + for (vq = 1; vq <= ARM_MAX_VQ; ++vq) { >>> + char name[8]; >>> + sprintf(name, "sve%d", vq * 128); >>> + object_property_add(obj, name, "bool", cpu_arm_get_sve_vq, >>> + cpu_arm_set_sve_vq, NULL, NULL, >>> &error_fatal); >>> + } >>> } >>> } >>> >>> diff --git a/target/arm/helper.c b/target/arm/helper.c >>> index f500ccb6d31b..b7b719dba57f 100644 >>> --- a/target/arm/helper.c >>> +++ b/target/arm/helper.c >>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const >>> ARMCPRegInfo *ri, >>> >>> /* Bits other than [3:0] are RAZ/WI. */ >>> QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16); >>> - raw_write(env, ri, value & 0xf); >>> + value &= 0xf; >>> + >>> + if (value) {> + /* get next vq that is smaller than or equal to >>> value's vq */ >>> + uint32_t vq = value + 1; >> ditto > > What does this 'ditto' map to? Oh, probably the discussion about vq's > getting +1's and -1's. In this context 'value' is the ZCR_ELx > representation of a VQ, which is VQ - 1, just like in the bitmap. To > translate that to a VQ we need to do a '+ 1'. Until here I follow. By the way in which doc did you find the description of ZCR_ELx? As I wrote in the comment > here, we want to find the next smaller or equal VQ here, because this is > the input VQ from the guest which may itself be a valid VQ, but if it's > not, we need to step down to the next valid one. So we ask > arm_cpu_vq_map_next_smaller() for 'vq + 1' in order to ensure 'vq' will > be in the searched range. After we get a valid vq from > arm_cpu_vq_map_next_smaller(), which must be at least '1', since that's > required by the architecture and thus will always be present, we need to > translate it back to the ZCR_ELx representation with the subtraction. > Phew... We programmers do love adding and subtracting by one, right :-) Got it now. but well I scratched my head > >>> + vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1); >>> + value = vq - 1; >>> + } >>> + >>> + raw_write(env, ri, value); >>> >>> /* >>> * Because we arrived here, we know both FP and SVE are enabled; >>> diff --git a/target/arm/monitor.c b/target/arm/monitor.c >>> index 157c487a1551..1e213906fd8f 100644 >>> --- a/target/arm/monitor.c >>> +++ b/target/arm/monitor.c >>> @@ -89,8 +89,24 @@ GICCapabilityList *qmp_query_gic_capabilities(Error >>> **errp) >>> return head; >>> } >>> >>> +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16); >>> + >>> +/* >>> + * These are cpu model features we want to advertise. The order here >>> + * matters as this is the order in which qmp_query_cpu_model_expansion >>> + * will attempt to set them. If there are dependencies between features, >>> + * as there are with the sve<vl-bits> features, then the order that >>> + * considers those dependencies must be used. >>> + * >>> + * The sve<vl-bits> features need to be in reverse order in order to >>> + * enable/disable the largest vector lengths first, ensuring all >>> + * power-of-2 vector lengths smaller can also be enabled/disabled. >>> + */ >>> static const char *cpu_model_advertised_features[] = { >>> "aarch64", "pmu", "sve", >>> + "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408", >>> + "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640", >>> + "sve512", "sve384", "sve256", "sve128", >>> NULL >>> }; >>> >>> diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c >> Please move all the tests in a separate patch. > > I'd prefer not to. I like keeping the test cases that test the new code in > the same patch. It self documents what the test cases map to what code and > allows immediate testing of the patch when bisecting. Also I'm not really > sure how it makes review worse, as another patch would look the same, just > in a different email. A reviewier might not be familiar with both standard code and test and feel reluctant to invest reading boths, hence reducing the chances to collect R-b's. But that's my humble opinion. > >> Each day has enough trouble of its own ;-) sweat... > > I really appreciate the review! I realize it's generating sweat, > especially with the European heat wave! You don't have to finish > a patch before sending comments. I'm fine with multiple replies to > the same patch if you want to break your review up into smaller > bites. Thanks. Yes breathing times are needed due to the heat and digestion of the code ;-) Thanks Eric > > Thanks, > drew > >> >> Thanks >> >> Eric >>> index 509e458e9c2f..a4bf6aec00df 100644 >>> --- a/tests/arm-cpu-features.c >>> +++ b/tests/arm-cpu-features.c >>> @@ -13,6 +13,18 @@ >>> #include "qapi/qmp/qdict.h" >>> #include "qapi/qmp/qjson.h" >>> >>> +#if __SIZEOF_LONG__ == 8 >>> +#define BIT(n) (1UL << (n)) >>> +#else >>> +#define BIT(n) (1ULL << (n)) >>> +#endif >>> + >>> +/* >>> + * We expect the SVE max-vq to be 16. Also it must be <= 64 >>> + * for our test code, otherwise 'vls' can't just be a uint64_t. >>> + */ >>> +#define SVE_MAX_VQ 16 >>> + >>> #define MACHINE "-machine virt,gic-version=max " >>> #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \ >>> "'arguments': { 'type': 'full', " >>> @@ -137,6 +149,201 @@ static void assert_bad_props(QTestState *qts, const >>> char *cpu_type) >>> qobject_unref(resp); >>> } >>> >>> +static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq) >>> +{ >>> + const QDictEntry *e; >>> + QDict *qdict; >>> + int n = 0; >>> + >>> + *vls = 0; >>> + >>> + qdict = resp_get_props(resp); >>> + >>> + for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) { >>> + if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) && >>> + g_ascii_isdigit(e->key[3])) { >>> + char *endptr; >>> + int bits; >>> + >>> + bits = g_ascii_strtoll(&e->key[3], &endptr, 10); >>> + if (!bits || *endptr != '\0') { >>> + continue; >>> + } >>> + >>> + if (qdict_get_bool(qdict, e->key)) { >>> + *vls |= BIT((bits / 128) - 1); >>> + } >>> + ++n; >>> + } >>> + } >>> + >>> + g_assert(n == SVE_MAX_VQ); >>> + >>> + *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls); >>> +} >>> + >>> +static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type, >>> + const char *fmt, ...) >>> +{ >>> + QDict *resp; >>> + uint64_t vls; >>> + uint32_t max_vq; >>> + >>> + if (fmt) { >>> + QDict *args; >>> + va_list ap; >>> + >>> + va_start(ap, fmt); >>> + args = qdict_from_vjsonf_nofail(fmt, ap); >>> + va_end(ap); >>> + >>> + resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, " >>> + "'props': %p }" >>> + QUERY_TAIL, cpu_type, args); >>> + } else { >>> + resp = do_query_no_props(qts, cpu_type); >>> + } >>> + >>> + g_assert(resp); >>> + resp_get_sve_vls(resp, &vls, &max_vq); >>> + qobject_unref(resp); >>> + >>> + return vls; >>> +} >>> + >>> +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \ >>> + g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == >>> expected_vls) >>> + >>> +static void sve_tests_default(QTestState *qts, const char *cpu_type) >>> +{ >>> + /* >>> + * With no sve-max-vq or sve<vl-bits> properties on the command line >>> + * the default is to have all vector lengths enabled. >>> + */ >>> + assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL); >>> + >>> + /* >>> + * ------------------------------------------------------------------- >>> + * power-of-2(vq) all-power- can can >>> + * of-2(< vq) enable disable >>> + * ------------------------------------------------------------------- >>> + * vq < max_vq no MUST* yes yes >>> + * vq < max_vq yes MUST* yes no >>> + * ------------------------------------------------------------------- >>> + * vq == max_vq n/a MUST* yes** yes** >>> + * ------------------------------------------------------------------- >>> + * vq > max_vq n/a no no yes >>> + * vq > max_vq n/a yes yes yes >>> + * ------------------------------------------------------------------- >>> + * >>> + * [*] "MUST" means this requirement must already be satisfied, >>> + * otherwise 'max_vq' couldn't itself be enabled. >>> + * >>> + * [**] Not testable with the QMP interface, only with the command >>> line. >>> + */ >>> + >>> + /* max_vq := 8 */ >>> + assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }"); >>> + >>> + /* max_vq := 8, vq < max_vq, !power-of-2(vq) */ >>> + assert_sve_vls(qts, cpu_type, 0x8f, >>> + "{ 'sve1024': true, 'sve384': true }"); >>> + assert_sve_vls(qts, cpu_type, 0x8b, >>> + "{ 'sve1024': true, 'sve384': false }"); >>> + >>> + /* max_vq := 8, vq < max_vq, power-of-2(vq) */ >>> + assert_sve_vls(qts, cpu_type, 0x8b, >>> + "{ 'sve1024': true, 'sve256': true }"); >>> + assert_error(qts, cpu_type, "cannot disable sve256", >>> + "{ 'sve1024': true, 'sve256': false }"); >>> + >>> + /* >>> + * max_vq := 3, vq > max_vq, !all-power-of-2(< vq) >>> + * >>> + * If given sve384=on,sve512=off,sve640=on the command line error >>> would be >>> + * "cannot enable sve640", but QMP visits the vector lengths in reverse >>> + * order, so we get "cannot disable sve512" instead. The command line >>> would >>> + * also give that error if given sve384=on,sve640=on,sve512=off, so >>> this is >>> + * all fine. The important thing is that we get an error. >>> + */ >>> + assert_error(qts, cpu_type, "cannot disable sve512", >>> + "{ 'sve384': true, 'sve512': false, 'sve640': true }"); >>> + >>> + /* >>> + * We can disable power-of-2 vector lengths when all larger lengths >>> + * are also disabled. The shorter, sve384=on,sve512=off,sve640=off >>> + * works on the command line, but QMP doesn't know that all the >>> + * vector lengths larger than 384-bits will be disabled until it >>> + * sees the enabling of sve384, which comes near the end since it >>> + * visits the lengths in reverse order. So we just have to explicitly >>> + * disable them all. >>> + */ >>> + assert_sve_vls(qts, cpu_type, 0x7, >>> + "{ 'sve384': true, 'sve512': false, 'sve640': false, " >>> + " 'sve768': false, 'sve896': false, 'sve1024': false, " >>> + " 'sve1152': false, 'sve1280': false, 'sve1408': >>> false, " >>> + " 'sve1536': false, 'sve1664': false, 'sve1792': >>> false, " >>> + " 'sve1920': false, 'sve2048': false }"); >>> + >>> + /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */ >>> + assert_sve_vls(qts, cpu_type, 0x1f, >>> + "{ 'sve384': true, 'sve512': true, 'sve640': true }"); >>> + assert_sve_vls(qts, cpu_type, 0xf, >>> + "{ 'sve384': true, 'sve512': true, 'sve640': false }"); >>> +} >>> + >>> +static void sve_tests_sve_max_vq_8(const void *data) >>> +{ >>> + QTestState *qts; >>> + >>> + qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8"); >>> + >>> + assert_sve_vls(qts, "max", BIT(8) - 1, NULL); >>> + >>> + /* >>> + * Disabling the max-vq set by sve-max-vq is not allowed, but >>> + * of course enabling it is OK. >>> + */ >>> + assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false >>> }"); >>> + assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }"); >>> + >>> + /* >>> + * Enabling anything larger than max-vq set by sve-max-vq is not >>> + * allowed, but of course disabling everything larger is OK. >>> + */ >>> + assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true >>> }"); >>> + assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }"); >>> + >>> + /* >>> + * We can disable non power-of-2 lengths smaller than the max-vq >>> + * set by sve-max-vq, but not power-of-2 lengths. >>> + */ >>> + assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }"); >>> + assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false >>> }"); >>> + >>> + qtest_quit(qts); >>> +} >>> + >>> +static void sve_tests_sve_off(const void *data) >>> +{ >>> + QTestState *qts; >>> + >>> + qts = qtest_init(MACHINE "-cpu max,sve=off"); >>> + >>> + /* >>> + * SVE is off, so the map should be empty. >>> + */ >>> + assert_sve_vls(qts, "max", 0, NULL); >>> + >>> + /* >>> + * We can't turn anything on, but off is OK. >>> + */ >>> + assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }"); >>> + assert_sve_vls(qts, "max", 0, "{ 'sve128': false }"); >>> + >>> + qtest_quit(qts); >>> +} >>> + >>> static void test_query_cpu_model_expansion(const void *data) >>> { >>> QTestState *qts; >>> @@ -159,9 +366,12 @@ static void test_query_cpu_model_expansion(const void >>> *data) >>> if (g_str_equal(qtest_get_arch(), "aarch64")) { >>> assert_has_feature(qts, "max", "aarch64"); >>> assert_has_feature(qts, "max", "sve"); >>> + assert_has_feature(qts, "max", "sve128"); >>> assert_has_feature(qts, "cortex-a57", "pmu"); >>> assert_has_feature(qts, "cortex-a57", "aarch64"); >>> >>> + sve_tests_default(qts, "max"); >>> + >>> /* Test that features that depend on KVM generate errors without. >>> */ >>> assert_error(qts, "max", >>> "'aarch64' feature cannot be disabled " >>> @@ -213,6 +423,13 @@ int main(int argc, char **argv) >>> qtest_add_data_func("/arm/query-cpu-model-expansion", >>> NULL, test_query_cpu_model_expansion); >>> >>> + if (g_str_equal(qtest_get_arch(), "aarch64")) { >>> + >>> qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8", >>> + NULL, sve_tests_sve_max_vq_8); >>> + qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off", >>> + NULL, sve_tests_sve_off); >>> + } >>> + >>> if (kvm_available) { >>> qtest_add_data_func("/arm/kvm/query-cpu-model-expansion", >>> NULL, test_query_cpu_model_expansion_kvm); >>> >>