Re: [RFC 0/5] accel/kvm: Support KVM PMU filter
Hi Zhao, On 8/2/24 17:37, Zhao Liu wrote: Hello Shaoqin, On Fri, Aug 02, 2024 at 05:01:47PM +0800, Shaoqin Huang wrote: Date: Fri, 2 Aug 2024 17:01:47 +0800 From: Shaoqin Huang Subject: Re: [RFC 0/5] accel/kvm: Support KVM PMU filter Hi Zhao, On 7/10/24 12:51, Zhao Liu wrote: Hi QEMU maintainers, arm and PMU folks, I picked up Shaoqing's previous work [1] on the KVM PMU filter for arm, and now is trying to support this feature for x86 with a JSON-compatible API. While arm and x86 use different KVM ioctls to configure the PMU filter, considering they all have similar inputs (PMU event + action), it is still possible to abstract a generic, cross-architecture kvm-pmu-filter object and provide users with a sufficiently generic or near-consistent QAPI interface. That's what I did in this series, a new kvm-pmu-filter object, with the API like: -object '{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0xc4"}]}' For i386, this object is inserted into kvm accelerator and is extended to support fixed-counter and more formats ("x86-default" and "x86-masked-entry"): -accel kvm,pmu-filter=f0 \ -object pmu='{"qom-type":"kvm-pmu-filter","id":"f0","x86-fixed-counter":{"action":"allow","bitmap":"0x0"},"events":[{"action":"allow","format":"x86-masked-entry","select":"0xc4","mask":"0xff","match":"0","exclude":true},{"action":"allow","format":"x86-masked-entry","select":"0xc5","mask":"0xff","match":"0","exclude":true}]}' What if I want to create the PMU Filter on ARM to deny the event range [0x5,0x10], and allow deny event 0x13, how should I write the json? Cuurently this doesn't support the event range (since the raw format of x86 event cannot be said to be continuous). So with the basic support, we need to configure events one by one: -object pmu='{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0x5"},{"action":"allow","format":"raw","select":"0x6"},{"action":"allow","format":"raw","code":"0x7"},{"action":"allow","format":"raw","code":"0x8"},{"action":"allow","format":"raw","code":"0x9"},{"action":"allow","format":"raw","code":"0x10"},{"action":"deny","format":"raw","code":"0x13"}]}' This one looks a lot more complicated, but in the future, arm could further support event-range (maybe implement event-range via mask), but I think this could be arch-specific format since not all architectures' events are continuous. Additional, I'm a bit confused by your example, and I hope you can help me understand that: when configuring 0x5~0x10 to be allow, isn't it true that all other events are denied by default, so denying 0x13 again is a redundant operation? What is the default action for all other events except 0x5~0x10 and 0x13? If we specify action as allow for 0x5~0x10 and deny for the rest by default, then there is no need to set an action for each event but only a global one (as suggested by Dapeng), so the above command line can be simplified as: -object pmu='{"qom-type":"kvm-pmu-filter","id":"f0","action":"allow","events":[{"format":"raw","code":"0x5"},{"format":"raw","select":"0x6"},{"format":"raw","code":"0x7"},{"format":"raw","code":"0x8"},{"format":"raw","code":"0x9"},{"format":"raw","code":"0x10"}]}' Yes you are right. On Arm when you first set the PMU Filter, if the first filter is allow, then all other event will be denied by default. The reverse is also the same, if the first filter is deny, then all other event will be allowed by default. On ARM the PMU Filter is much more simper than x86 I think. We only need to care about the special event with allow or deny action. If we don't support event range filter, I think that's fine. This can be added in the future. Thanks, Shaoqin Thanks, Zhao -- Shaoqin
Re: [RFC 0/5] accel/kvm: Support KVM PMU filter
Hi Zhao, On 7/10/24 12:51, Zhao Liu wrote: Hi QEMU maintainers, arm and PMU folks, I picked up Shaoqing's previous work [1] on the KVM PMU filter for arm, and now is trying to support this feature for x86 with a JSON-compatible API. While arm and x86 use different KVM ioctls to configure the PMU filter, considering they all have similar inputs (PMU event + action), it is still possible to abstract a generic, cross-architecture kvm-pmu-filter object and provide users with a sufficiently generic or near-consistent QAPI interface. That's what I did in this series, a new kvm-pmu-filter object, with the API like: -object '{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0xc4"}]}' For i386, this object is inserted into kvm accelerator and is extended to support fixed-counter and more formats ("x86-default" and "x86-masked-entry"): -accel kvm,pmu-filter=f0 \ -object pmu='{"qom-type":"kvm-pmu-filter","id":"f0","x86-fixed-counter":{"action":"allow","bitmap":"0x0"},"events":[{"action":"allow","format":"x86-masked-entry","select":"0xc4","mask":"0xff","match":"0","exclude":true},{"action":"allow","format":"x86-masked-entry","select":"0xc5","mask":"0xff","match":"0","exclude":true}]}' What if I want to create the PMU Filter on ARM to deny the event range [0x5,0x10], and allow deny event 0x13, how should I write the json? Thanks, Shaoqin This object can still be added as the property to the arch CPU if it is desired as a per CPU feature (as Shaoqin did for arm before). Welcome your feedback and comments! Introduction Formats supported in kvm-pmu-filter --- This series supports 3 formats: * raw format (general format). This format indicates the code that has been encoded to be able to index the PMU events, and which can be delivered directly to the KVM ioctl. For arm, this means the event code, and for i386, this means the raw event with the layout like: select high bit | umask | select low bits * x86-default format (i386 specific) x86 commonly uses select&umask to identify PMU events, and this format is used to support the select&umask. Then QEMU will encode select and umask into a raw format code. * x86-masked-entry (i386 specific) This is a special format that x86's KVM_SET_PMU_EVENT_FILTER supports. Hexadecimal value string In practice, the values associated with PMU events (code for arm, select& umask for x86) are often expressed in hexadecimal. Further, from linux perf related information (tools/perf/pmu-events/arch/*/*/*.json), x86/ arm64/riscv/nds32/powerpc all prefer the hexadecimal numbers and only s390 uses decimal value. Therefore, it is necessary to support hexadecimal in order to honor PMU conventions. However, unfortunately, standard JSON (RFC 8259) does not support hexadecimal numbers. So I can only consider using the numeric string in the QAPI and then parsing it to a number. To achieve this, I defined two versions of PMU-related structures in kvm.json: * a native version that accepts numeric values, which is used for QEMU's internal code processing, * and a variant version that accepts numeric string, which is used to receive user input. kvm-pmu-filter object will take care of converting the string version of the event/counter information into the numeric version. The related implementation can be found in patch 1. CPU property v.s. KVM property -- In Shaoqin's previous implementation [1], KVM PMU filter is made as a arm CPU property. This is because arm uses a per CPU ioctl (KVM_SET_DEVICE_ATTR) to configure KVM PMU filter. However, for x86, the dependent ioctl (KVM_SET_PMU_EVENT_FILTER) is per VM. In the meantime, considering that for hybrid architecture, maybe in the future there will be a new per vCPU ioctl, or there will be practices to support filter fixed counter by configuring CPUIDs. Based on the above thoughts, for x86, it is not appropriate to make the current per-VM ioctl-based PMU filter a CPU property. Instead, I make it a kvm property and configure it via "-accel kvm,pmu-filter=obj_id". So in summary, it is feasible to use the KVM PMU filter as either a CPU or a KVM property, depending on whether it is used as a CPU feature or a VM feature. The kvm-pmu-filter object, as an abstraction, is general enough to support filter configurations for different scopes (per-CPU or per-VM). [1]: https://lore.kernel.org/qemu-devel/20240409024940.180107-1-shahu...@redhat.com/ Thanks and Best Regards, Zhao --- Zhao Liu (5): qapi/qom: Introduce kvm-pmu-filter object i386/kvm: Support initial KVM PMU filter i386/kvm: Support event with select&umask format in KVM PMU filter i386/kvm: Support event with masked entry format in KVM PMU filter i386/kvm: Support fixed counter in KVM PMU filter MAINTAINERS| 1 + accel/kvm/kvm-pmu.c| 367 +++
Re: [PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Zhao, Thanks for your proposed idea. If you are willing to take the PMU Filter Enabling work, you can do it. I won't update this series anymore due to the QAPI restriction. I really appreciate if you can implement that. Thanks, Shaoqin On 5/13/24 14:52, Zhao Liu wrote: Hi Daniel, Please describe it in terms of a QAPI definition, as that's what we're striving for with all QEMU public interfaces. Once the QAPI design is agreed, then the -object mapping is trivial, as -object's JSON format supports arbitrary QAPI structures. Thank you for your guidance! I rethought and and modified my previous proposal: Let me show the command examples firstly: * Add a single event: (x86) -object kvm-pmu-event,id=e0,action=allow,format=x86-default,\ select=0x3c,umask=0x00 (arm or general) -object kvm-pmu-event,id=e1,action=deny,\ format=raw,code=0x01 * Add a counter bitmap: (x86) -object kvm-pmu-counter,id=cnt,action=allow,type=x86-fixed,\ bitmap=0x * Add an event list (must use Json syntax format): (x86) -object '{"qom-type":"kvm-pmu-event-list","id"="filter0","action"="allow","format"="x86-default","events=[{"select"=0x3c,"umask"=0x00},{"select"=0x2e,"umask"=0x4f}]' (arm) -object '{"qom-type":"kvm-pmu-event-list","id"="filter1","action"="allow","format"="raw","events"=[{"code"=0x01},{"code"=0x02}]' The specific JSON definitions are as follows (IIUC, this is "in terms of a QAPI definition", right? ;-)): * Define PMU event and counter bitmap with JSON format: - basic filter action: { 'enum': 'KVMPMUFilterAction', 'prefix': 'KVM_PMU_FILTER_ACTION', 'data': ['deny', 'allow' ] } - PMU counter: { 'enum': 'KVMPMUCounterType', 'prefix': 'KVM_PMU_COUNTER_TYPE', 'data': [ 'x86-fixed' ] } { 'struct': 'KVMPMUX86FixedCounter', 'data': { 'bitmap': 'uint32' } } - PMU events (total 3 formats): # 3 encoding formats: "raw" is compatible with shaoqin's ARM format as # well as the x86 raw format, and could support other architectures in # the future. { 'enum': 'KVMPMUEventEncodeFmt', 'prefix': 'KVM_PMU_EVENT_ENCODE_FMT', 'data': ['raw', 'x86-default', 'x86-masked-entry' ] } # A general format. { 'struct': 'KVMPMURawEvent', 'data': { 'code': 'uint64' } } # x86-specific { 'struct': 'KVMPMUX86DefalutEvent', 'data': { 'select': 'uint16', 'umask': 'uint16' } } # another x86 specific { 'struct': 'KVMPMUX86MaskedEntry', 'data': { 'select': 'uint16', 'match': 'uint8', 'mask': 'uint8', 'exclude': 'bool' } } # And their list wrappers: { 'struct': 'KVMPMURawEventList', 'data': { 'events': ['KVMPMURawEvent'] } } { 'struct': 'KVMPMUX86DefalutEventList', 'data': { 'events': ['KVMPMUX86DefalutEvent'] } } { 'struct': 'KVMPMUX86MaskedEntryList', 'data': { 'events': ['KVMPMUX86MaskedEntryList'] } } Based on the above basic structs, we could provide 3 new more qom-types: - 'kvm-pmu-counter': 'KVMPMUFilterCounter' # This is a single object option to configure PMU counter # bitmap filter. { 'union': 'KVMPMUFilterCounter', 'base': { 'action': 'KVMPMUFilterAction', 'type': 'KVMPMUCounterType' }, 'discriminator': 'type', 'data': { 'x86-fixed': 'KVMPMUX86FixedCounter' } } - 'kvm-pmu-counter': 'KVMPMUFilterCounter' # This option is used to configure a single PMU event for # PMU filter. { 'union': 'KVMPMUFilterEvent', 'base': { 'action': 'KVMPMUFilterAction', 'format': 'KVMPMUEventEncodeFmt' }, 'discriminator': 'format', 'data': { 'raw': 'KVMPMURawEvent', 'x86-default': 'KVMPMUX86DefalutEvent', 'x86-masked-entry': 'KVMPMUX86MaskedEntry' } } - 'kvm-pmu-event-list': 'KVMPMUFilterEventList' # Used to configure multiple events. { 'union': 'KVMPMUFilterEventList', 'base': { 'action': 'KVMPMUFilterAction', 'format': 'KVMPMUEventEncodeFmt' }, 'discriminator': 'format', 'data': { 'raw': 'KVMPMURawEventList', 'x86-default': 'KVMPMUX86DefalutEventList', 'x86-masked-entry': 'KVMPMUX86MaskedEntryList' } } Compared to Shaoqin's original format, kvm-pmu-event-list is not able to enumerate events continuously (similar to 0x00-0x30 before), and now user must enumerate events one by one individually. What do you think about the above 3 new commands? Thanks and Best Regards, Zhao -- Shaoqin
Re: [PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Daniel, On 4/16/24 01:29, Daniel P. Berrangé wrote: On Mon, Apr 08, 2024 at 10:49:40PM -0400, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" I'm still against implementing this one-off custom parsed syntax for kvm-pmu-filter values. Once this syntax exists, we're locked into back-compatibility for multiple releases, and it will make a conversion to QAPI/JSON harder. Thanks for your effort of reviewing my patch. I think if I need cost more time about the QAPI, that's outside my initial idea and deviate from supporting the PMU Filter. So I decide to not update this patch now. And wait until I have time to look into the QAPI or the -cpu option has been transformed to QAPI format. Thanks, Shaoqin With regards, Daniel -- Shaoqin
Re: [PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Thmoas, On 4/9/24 13:33, Thomas Huth wrote: + assert_has_feature(qts, "host", "kvm-pmu-filter"); So you assert here that the feature is available ... assert_has_feature(qts, "host", "kvm-steal-time"); assert_has_feature(qts, "host", "sve"); resp = do_query_no_props(qts, "host"); + kvm_supports_pmu_filter = resp_get_feature_str(resp, "kvm-pmu-filter"); kvm_supports_steal_time = resp_get_feature(resp, "kvm-steal-time"); kvm_supports_sve = resp_get_feature(resp, "sve"); vls = resp_get_sve_vls(resp); qobject_unref(resp); + if (kvm_supports_pmu_filter) { > ... why do you then need to check for its availability here again? I either don't understand this part of the code, or you could drop the kvm_supports_pmu_filter variable and simply always execute the code below. Thanks for your reviewing. I did so because all other feature like "kvm-steal-time" check its availability again. I don't know the original reason why they did that. I just followed it. Do you think we should delete all the checking? Thanks, Shaoqin Thomas -- Shaoqin
[PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Signed-off-by: Shaoqin Huang --- v8->v9: - Replace the warn_report to error_setg in some places. - Merge the check condition to make code more clean. - Try to use the QAPI format for the PMU Filter property but failed to use it since the -cpu option doesn't support json format yet. v7->v8: - Add qtest for kvm-pmu-filter. - Do the kvm-pmu-filter syntax checking up-front in the kvm_pmu_filter_set() function. And store the filter information at there. When kvm_pmu_filter_get() reconstitute it. v6->v7: - Check return value of sscanf. - Improve the check condition. v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] --- docs/system/arm/cpu-features.rst | 23 +++ target/arm/arm-qmp-cmds.c| 2 +- target/arm/cpu.h | 3 + target/arm/kvm.c | 112 +++ tests/qtest/arm-cpu-features.c | 51 ++ 5 files changed, 190 insertions(+), 1 deletion(-) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..f3930f34b3 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all PMU + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy (global ALLOW if the first action i
Re: [PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 3/19/24 23:23, Eric Auger wrote: +if (kvm_supports_pmu_filter) { +assert_set_feature_str(qts, "host", "kvm-pmu-filter", ""); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "A:0x11-0x11"); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "D:0x11-0x11"); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "A:0x11-0x11;A:0x12-0x20"); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "D:0x11-0x11;A:0x12-0x20;D:0x12-0x15"); Just to double check this set the filter and checks the filter is applied, is that correct? I see you set some ranges of events. Are you sure those events are supported on host PMU and won't create a failure on setting the PMU filter? What I test here is that checking if the PMU Filter parser is right which I write in the kvm_pmu_filter_set/get function, I don't test any KVM side things like if the PMU event is supported by host. Thanks, Shaoqin Thanks Eric -- Shaoqin
Re: [PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Kevin, On 4/2/24 21:01, Kevin Wolf wrote: Maybe I'm wrong. So I want to double check with if the -cpu option support json format nowadays? As far as I can see, -cpu doesn't support JSON yet. But even if it did, your command line would be invalid because the 'host,' part isn't JSON. Thanks for answering my question. I guess I should still keep the current implementation, and to transform the property in the future when the -cpu option support JSON format. Thanks, Shaoqin If the -cpu option doesn't support json format, how I can use the QAPI for kvm-pmu-filter property? This would probably mean QAPIfying all CPUs first, which sounds like a major effort. -- Shaoqin
Re: [PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Daniel, On 3/25/24 16:55, Daniel P. Berrangé wrote: On Mon, Mar 25, 2024 at 01:35:58PM +0800, Shaoqin Huang wrote: Hi Daniel, Thanks for your reviewing. I see your comments in the v7. I have some doubts about what you said about the QAPI. Do you want me to convert the current design into the QAPI parsing like the IOThreadVirtQueueMapping? And we need to add new json definition in the qapi/ directory? I have defined the QAPI for kvm-pmu-filter like below: +## +# @FilterAction: +# +# The Filter Action +# +# @a: Allow +# +# @d: Disallow +# +# Since: 9.0 +## +{ 'enum': 'FilterAction', + 'data': [ 'a', 'd' ] } + +## +# @SingleFilter: +# +# Lazy +# +# @action: the action +# +# @start: the start +# +# @end: the end +# +# Since: 9.0 +## + +{ 'struct': 'SingleFilter', + 'data': { 'action': 'FilterAction', 'start': 'int', 'end': 'int' } } + +## +# @KVMPMUFilter: +# +# Lazy +# +# @filter: the filter +# +# Since: 9.0 +## + +{ 'struct': 'KVMPMUFilter', + 'data': { 'filter': ['SingleFilter'] }} And I guess I can use it by adding code like below: --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -1206,3 +1206,35 @@ const PropertyInfo qdev_prop_iothread_vq_mapping_list = { .set = set_iothread_vq_mapping_list, .release = release_iothread_vq_mapping_list, }; + +/* --- kvm-pmu-filter ---*/ + +static void get_kvm_pmu_filter(Object *obj, Visitor *v, +const char *name, void *opaque, Error **errp) +{ +KVMPMUFilter **prop_ptr = object_field_prop_ptr(obj, opaque); + +visit_type_KVMPMUFilter(v, name, prop_ptr, errp); +} + +static void set_kvm_pmu_filter(Object *obj, Visitor *v, +const char *name, void *opaque, Error **errp) +{ +KVMPMUFilter **prop_ptr = object_field_prop_ptr(obj, opaque); +KVMPMUFilter *list; + +printf("running the %s\n", __func__); +if (!visit_type_KVMPMUFilter(v, name, &list, errp)) { +return; +} + +printf("The name is %s\n", name); +*prop_ptr = list; +} + +const PropertyInfo qdev_prop_kvm_pmu_filter = { +.name = "KVMPMUFilter", +.description = "der der", +.get = get_kvm_pmu_filter, +.set = set_kvm_pmu_filter, +}; +#define DEFINE_PROP_KVM_PMU_FILTER(_name, _state, _field) \ +DEFINE_PROP(_name, _state, _field, qdev_prop_kvm_pmu_filter, \ +KVMPMUFilter *) --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -2439,6 +2441,7 @@ static Property arm_cpu_properties[] = { mp_affinity, ARM64_AFFINITY_INVALID), DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1), +DEFINE_PROP_KVM_PMU_FILTER("kvm-pmu-filter", ARMCPU, kvm_pmu_filter), DEFINE_PROP_END_OF_LIST() }; And I guess I can use the new json format input like below: qemu-system-aarch64 \ -cpu host, '{"filter": [{"action": "a", "start": 0x10, "end": "0x11"}]}' But it doesn't work. It seems like because the -cpu option doesn't support json format parameter. Maybe I'm wrong. So I want to double check with if the -cpu option support json format nowadays? If the -cpu option doesn't support json format, how I can use the QAPI for kvm-pmu-filter property? Thanks, Shaoqin Yes, you would define a type in the qapi dir similar to how is done for IOThreadVirtQueueMapping, and then you can use that in the property setter method. With regards, Daniel -- Shaoqin
Re: [PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Daniel, Thanks for your reviewing. I see your comments in the v7. I have some doubts about what you said about the QAPI. Do you want me to convert the current design into the QAPI parsing like the IOThreadVirtQueueMapping? And we need to add new json definition in the qapi/ directory? Thanks, Shaoqin On 3/22/24 22:53, Daniel P. Berrangé wrote: On Tue, Mar 12, 2024 at 03:48:49AM -0400, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" I mistakenly sent some comments to the older v7 (despite this v8 already existing) about the design of this syntax So for linking up the threads: https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg04703.html With regards, Daniel -- Shaoqin
[PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Signed-off-by: Shaoqin Huang --- v7->v8: - Add qtest for kvm-pmu-filter. - Do the kvm-pmu-filter syntax checking up-front in the kvm_pmu_filter_set() function. And store the filter information at there. When kvm_pmu_filter_get() reconstitute it. v6->v7: - Check return value of sscanf. - Improve the check condition. v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] --- docs/system/arm/cpu-features.rst | 23 +++ target/arm/arm-qmp-cmds.c| 2 +- target/arm/cpu.h | 3 + target/arm/kvm.c | 115 +++ tests/qtest/arm-cpu-features.c | 51 ++ 5 files changed, 193 insertions(+), 1 deletion(-) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..f3930f34b3 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all PMU + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy (global ALLOW if the first action is DENY, global DENY + if the first action is ALLOW). The start and end only support hexadecimal + format. For example: + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will al
Re: [PATCH v7] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Peter, On 2/22/24 22:28, Peter Maydell wrote: On Wed, 21 Feb 2024 at 06:34, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Reviewed-by: Sebastian Ott Signed-off-by: Shaoqin Huang --- v6->v7: - Check return value of sscanf. - Improve the check condition. v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 + target/arm/cpu.h | 3 ++ target/arm/kvm.c | 80 3 files changed, 106 insertions(+) The new syntax for the filter property seems quite complicated. I think it would be worth testing it with a new test in tests/qtest/arm-cpu-features.c. I was trying to add a test in tests/qtest/arm-cpu-features.c. But I found all other cpu-feature is bool property. When I use the 'query-cpu-model-expansion' to query the cpu-features, the kvm-pmu-filter will not shown in the returned results, just like below. {'execute': 'query-cpu-model-expansion', 'arguments': {'type': 'full', 'model': { 'name': 'host'}}}{"return": {}} {"return": {"model": {"name": "host", "props": {"sve768": false, "sve128": false, "sve1024": false, "sve1280": false, "sve896": false, "sve256": false, "sve1536": false, "sve1792": false, "sve384": false, "sve": false, "sve2048": false, "pauth": false, "kvm-no-adjvtime": false, "sve512": false, "aarch64": true, "pmu": true, "sve1920": false, "sve1152": false, "kvm-steal-time": true, "sve640": false, "sve1408": false, "sve1664": false I'm not sure if it's because the `query-cp
[PATCH v7] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Reviewed-by: Sebastian Ott Signed-off-by: Shaoqin Huang --- v6->v7: - Check return value of sscanf. - Improve the check condition. v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 + target/arm/cpu.h | 3 ++ target/arm/kvm.c | 80 3 files changed, 106 insertions(+) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..7c8f6a60ef 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all pmu + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy(global ALLOW if the first @action is DENY, global DENY + if the first @action is ALLOW). The start and end only support hexadecimal + format. For example: + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will allow event 0x11 (The cycle counter), events 0x23 to 0x3a are + also allowed except the event 0x30 which is denied, and all the other + events are denied. + TCG VCPU Features = diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 63f31e0d98..f7f2431755 100644 --- a/target/arm/cpu.h +
Re: [PATCH v6] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 2/15/24 17:13, Eric Auger wrote: Hi Shaoqin, On 2/1/24 09:51, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Reviewed-by: Sebastian Ott Signed-off-by: Shaoqin Huang --- v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 ++ target/arm/cpu.h | 3 ++ target/arm/kvm.c | 76 3 files changed, 102 insertions(+) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..26e306cc83 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all pmu + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy(global ALLOW if the first @action is DENY, global DENY + if the first @action is ALLOW). The start and end only support hexadecimal + format now. For example: nit: I would remove " now" Will remove it. + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will allow event 0x11 (The cycle counter), events 0x23 to 0x3a are + also allowed except the event 0x30 which is denied, and all the other + events are denied. + TCG VCPU Features
[PATCH v6] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Reviewed-by: Sebastian Ott Signed-off-by: Shaoqin Huang --- v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 ++ target/arm/cpu.h | 3 ++ target/arm/kvm.c | 76 3 files changed, 102 insertions(+) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..26e306cc83 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all pmu + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy(global ALLOW if the first @action is DENY, global DENY + if the first @action is ALLOW). The start and end only support hexadecimal + format now. For example: + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will allow event 0x11 (The cycle counter), events 0x23 to 0x3a are + also allowed except the event 0x30 which is denied, and all the other + events are denied. + TCG VCPU Features = diff --git a/target/arm/cpu.h b/target/arm/cpu.h index d3477b1601..2d860c227d 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -948,6 +948,9 @@ struct ArchCPU { /* KVM st
Re: [PATCH v5] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 1/17/24 20:59, Eric Auger wrote: Hi Shaoqin, On 1/15/24 09:01, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). do not hesitate to cc qemu-...@nongnu.org for ARM specific topics. Here is an example shows how to use the PMU Event Filtering, when which shows we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES). Actually it filters it ;-) It would rather say this filters out the cycle counter. But I am not a native speaker either ;-) And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. do still work Signed-off-by: Shaoqin Huang --- v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 ++ include/sysemu/kvm_int.h | 1 + target/arm/cpu.h | 3 ++ target/arm/kvm.c | 78 4 files changed, 105 insertions(+) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..44a797c50e 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all pmu + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy(global ALLOW if the first @action is DENY, global DENY + if the first @action is ALLOW). The start and end only support hexadecimal + format now. For example: + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is s/is/are + also allowed except the event 0x30 is denied, and all the other events 0x30 is/0x30 which is
Re: [PATCH v5] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 1/17/24 20:59, Eric Auger wrote: Hi Shaoqin, On 1/15/24 09:01, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). do not hesitate to cc qemu-...@nongnu.org for ARM specific topics. Here is an example shows how to use the PMU Event Filtering, when which shows we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES). Actually it filters it ;-) It would rather say this filters out the cycle counter. But I am not a native speaker either ;-) And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. do still work Signed-off-by: Shaoqin Huang --- v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 ++ include/sysemu/kvm_int.h | 1 + target/arm/cpu.h | 3 ++ target/arm/kvm.c | 78 4 files changed, 105 insertions(+) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..44a797c50e 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all pmu + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy(global ALLOW if the first @action is DENY, global DENY + if the first @action is ALLOW). The start and end only support hexadecimal + format now. For example: + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is s/is/are + also allowed except the event 0x30 is denied, and all the other events 0x30 is/0x30 which is
[PATCH v5] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] docs/system/arm/cpu-features.rst | 23 ++ include/sysemu/kvm_int.h | 1 + target/arm/cpu.h | 3 ++ target/arm/kvm.c | 78 4 files changed, 105 insertions(+) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..44a797c50e 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all pmu + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy(global ALLOW if the first @action is DENY, global DENY + if the first @action is ALLOW). The start and end only support hexadecimal + format now. For example: + + kvm-pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" + + Since the first action is allow, we have a global deny policy. It + will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is + also allowed except the event 0x30 is denied, and all the other events + are disallowed. + TCG VCPU Features = diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_l
[PATCH v4] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. Without the filter, the KVM will expose all events from the host to guest by default. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start is the first event of the range and the end is the last one. The first registered range defines the global policy(global ALLOW if the first @action is DENY, global DENY if the first @action is ALLOW). The start and end only support hex format now. For example: pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" Since the first action is allow, we have a global deny policy. It will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] v2: https://lore.kernel.org/all/20231117060838.39723-1-shahu...@redhat.com/ v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/ --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 21 +++ target/arm/kvm.c | 23 target/arm/kvm64.c | 75 4 files changed, 120 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..054865ba0d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, "tb-size=n (TCG translation block cache size)\n" "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" "eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" +"pmu-filter={A,D}:start-end[;{A,D}:start-end...] (KVM PMU Event Filter, default no filter. ARM only)\n" "notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" "thread=single|multi (enable multi-threaded T
Re: [PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Gavin, On 12/1/23 13:37, Gavin Shan wrote: Hi Shaoqin, On 11/29/23 14:08, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. Without the filter, the KVM will expose all events from the host to guest by default. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start is the first event of the range and the end is the last one. The first registered range defines the global policy(global ALLOW if the first @action is DENY, global DENY if the first @action is ALLOW). The start and end only support hex format now. For example: pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" Since the first action is allow, we have a global deny policy. It will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock # 0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations # 0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses # 3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error. [Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM. [Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] v2: https://lore.kernel.org/all/20231117060838.39723-1-shahu...@redhat.com/ v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/ --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 21 + target/arm/kvm.c | 23 ++ target/arm/kvm64.c | 68 4 files changed, 113 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; + char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..8b721d6668 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, " tb-size=n (TCG translation block cache size)\n" " dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" " eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" + " pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n" ^^^ Potential alignment issue, or the email isn't shown for me correctly. Besides, why not follow the pattern in the commit log, which is nicer than what's of being: pmu-filter={A,D}:start-end[;...] to pmu-filter="{A,D}:start-end[;{A,D}:start-end...] Ok. I can replace it with the better format. " notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" " thread=single|multi (enable multi-threaded TCG)\n&quo
Re: [PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
On 12/1/23 00:55, Sebastian Ott wrote: On Tue, 28 Nov 2023, Shaoqin Huang wrote: +static void kvm_arm_pmu_filter_init(CPUState *cs) +{ + static bool pmu_filter_init = false; + struct kvm_pmu_event_filter filter; + struct kvm_device_attr attr = { + .group = KVM_ARM_VCPU_PMU_V3_CTRL, + .attr = KVM_ARM_VCPU_PMU_V3_FILTER, + .addr = (uint64_t)&filter, + }; + KVMState *kvm_state = cs->kvm_state; + char *tmp; + char *str, act; + + if (!kvm_state->kvm_pmu_filter) + return; + + if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) { + error_report("The kernel doesn't support the pmu event filter!\n"); + abort(); + } + + /* The filter only needs to be initialized for 1 vcpu. */ + if (!pmu_filter_init) + pmu_filter_init = true; Imho this is missing an else to bail out. Or the shorter version if (pmu_filter_init) return; pmu_filter_init = true; Yes. This is what I want to do. Thanks for fixing it. which could also move above the other tests. Sebastian -- Shaoqin
[PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. Without the filter, the KVM will expose all events from the host to guest by default. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start is the first event of the range and the end is the last one. The first registered range defines the global policy(global ALLOW if the first @action is DENY, global DENY if the first @action is ALLOW). The start and end only support hex format now. For example: pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30" Since the first action is allow, we have a global deny policy. It will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] v2: https://lore.kernel.org/all/20231117060838.39723-1-shahu...@redhat.com/ v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/ --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 21 + target/arm/kvm.c | 23 ++ target/arm/kvm64.c | 68 4 files changed, 113 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..8b721d6668 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, "tb-size=n (TCG translation block cache size)\n" "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" "eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" +"pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n" "notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" "thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL) SRST @@ -259,6 +260,26 @@ SRST impact on the memory. By default, this feature is disabled (eager-split-size=0). +``pmu-filter={A,D}:start-end[;...]`` +KVM implements pmu event filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER attr + supported in KVM. It has the following format: + + pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + +
Re: [PATCH v2] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 11/25/23 02:24, Eric Auger wrote: Hi, On 11/17/23 07:08, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start is the first event of the range and the end is the last one. The first filter action defines if the whole event list is an allow or deny list, if the first filter action is "allow", all other events are denied except start-end; if the first filter action is "deny", all other events are allowed except start-end. For example: pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/ --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 16 + target/arm/kvm.c | 22 + target/arm/kvm64.c | 51 4 files changed, 90 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..dd3518092c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, "tb-size=n (TCG translation block cache size)\n" "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" "eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" +"pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n" "notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" "thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL) SRST @@ -259,6 +260,21 @@ SRST impact on the memory. By default, this feature is disabled (eager-split-size=0). +``pmu-filter={A,D}:start-end[;...]`` +KVM implements pmu event filtering to prevent a guest from being able to + sample certain events. It has the following format: + + pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start if the first event of the + range and the end is the last one. For example: + + pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" + + This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is + also allowed except the event 0x30 is denied, and all the other events + are disallowed. + ``notify-vmexit=run|internal-error|disable,notify-window=n`` Enables or disables notify VM exit support on x86 host and specify the corresponding notify window to trigger the VM exit if enabled. diff --git a/target/arm/kvm
Re: [PATCH v2] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 11/24/23 18:40, Eric Auger wrote: Hi Shaoqin, On 11/17/23 07:08, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. you remind the reader the default policy without filter (ie. expose all events from the hots) Yes. I will add this description to the default policy. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start is the first event of the range and the end is the last one. The first filter action defines if the whole event list is an allow or deny list, if the first filter action is "allow", all other events are denied except start-end; if the first filter action is "deny", all other events are allowed except start-end. For example: I prefer the kernel doc wording The first registered range defines the global policy (global ALLOW if the first @action is DENY, global DENY if the first @action is ALLOW). I can replace it this by kernel doc description in next version. pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" shoudn't the "," be replaced by a ";"? Yes. Thanks for catching this. It should be ";". I would add: since the first action is allow, we have a global deny policy. That makes the example more clear, will add it. This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" Since the first filter action is deny, we have a global allow policy. this disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES) kernel doc says that the ranges should match the PMU arch (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards). How do you handle that? Currently I think I can rely on the SET_DEVICE_ATTR, when set the KVM_ARM_VCPU_PMU_V3_FILTER, check the errno number, if it equals to EINVAL, then report the error to the use it's an invalid filter. Or another way is to detect the ARM version, and do more check in the userspace? Do you have any good suggestions on handle the two different event space in QEMU? And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. perf list should work as well It works, should I post it output at here? Signed-off-by: Shaoqin Huang --- v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/ --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 16 + target/arm/kvm.c | 22 + target/arm/kvm64.c | 51 4 files changed, 90 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..dd3518092c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, "tb-size=n (TCG translation block cache size)\n" "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" "eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" +"
Re: [PATCH V7 8/8] docs/specs/acpi_hw_reduced_hotplug: Add the CPU Hotplug Event Bit
On 11/14/23 04:12, Salil Mehta via wrote: GED interface is used by many hotplug events like memory hotplug, NVDIMM hotplug and non-hotplug events like system power down event. Each of these can be selected using a bit in the 32 bit GED IO interface. A bit has been reserved for the CPU hotplug event. Signed-off-by: Salil Mehta Reviewed-by: Shaoqin Huang --- docs/specs/acpi_hw_reduced_hotplug.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/specs/acpi_hw_reduced_hotplug.rst b/docs/specs/acpi_hw_reduced_hotplug.rst index 0bd3f9399f..3acd6fcd8b 100644 --- a/docs/specs/acpi_hw_reduced_hotplug.rst +++ b/docs/specs/acpi_hw_reduced_hotplug.rst @@ -64,7 +64,8 @@ GED IO interface (4 byte access) 0: Memory hotplug event 1: System power down event 2: NVDIMM hotplug event -3-31: Reserved + 3: CPU hotplug event +4-31: Reserved **write_access:** -- Shaoqin
[PATCH v2] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start is the first event of the range and the end is the last one. The first filter action defines if the whole event list is an allow or deny list, if the first filter action is "allow", all other events are denied except start-end; if the first filter action is "deny", all other events are allowed except start-end. For example: pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/ --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 16 + target/arm/kvm.c | 22 + target/arm/kvm64.c | 51 4 files changed, 90 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..dd3518092c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, "tb-size=n (TCG translation block cache size)\n" "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" "eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" +"pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n" "notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" "thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL) SRST @@ -259,6 +260,21 @@ SRST impact on the memory. By default, this feature is disabled (eager-split-size=0). +``pmu-filter={A,D}:start-end[;...]`` +KVM implements pmu event filtering to prevent a guest from being able to + sample certain events. It has the following format: + + pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start if the first event of the + range and the end is the last one. For example: + + pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" + + This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is + also allowed except the event 0x30 is denied, and all the other events + are disallowed. + ``notify-vmexit=run|internal-error|disable,notify-window=n`` Enables or disables notify VM exit support on x86 host and specify the corresponding notify window to trigger the VM exit if enabled. diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 7903e2ddde..74796de055 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -1108,6 +1108,21 @@ static void kvm_ar
Re: [PATCH v1] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Sebastian, On 11/15/23 20:17, Sebastian Ott wrote: Hi, On Mon, 13 Nov 2023, Shaoqin Huang wrote: + ``pmu-filter={A,D}:start-end[;...]`` + KVM implements pmu event filtering to prevent a guest from being able to + sample certain events. It has the following format: + + pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start if the first event of the ^ is Thanks for point it out. Also it should be stated that the first filter action defines if the whole list is an allow or a deny list. +static void kvm_arm_pmu_filter_init(CPUState *cs) +{ + struct kvm_pmu_event_filter filter; + struct kvm_device_attr attr = { + .group = KVM_ARM_VCPU_PMU_V3_CTRL, + .attr = KVM_ARM_VCPU_PMU_V3_FILTER, + }; + KVMState *kvm_state = cs->kvm_state; + char *tmp; + char *str, act; + + if (!kvm_state->kvm_pmu_filter) + return; + + tmp = g_strdup(kvm_state->kvm_pmu_filter); + + for (str = strtok(tmp, ";"); str != NULL; str = strtok(NULL, ";")) { + unsigned short start = 0, end = 0; + + sscanf(str, "%c:%hx-%hx", &act, &start, &end); + if ((act != 'A' && act != 'D') || (!start && !end)) { + error_report("skipping invalid filter %s\n", str); + continue; + } + + filter = (struct kvm_pmu_event_filter) { + .base_event = start, + .nevents = end - start + 1, + .action = act == 'A' ? KVM_PMU_EVENT_ALLOW : + KVM_PMU_EVENT_DENY, + }; + + attr.addr = (uint64_t)&filter; That could move to the initialization of attr (the address of filter doesn't change). It looks better. Will change it. + if (!kvm_arm_set_device_attr(cs, &attr, "PMU Event Filter")) { + error_report("Failed to init PMU Event Filter\n"); + abort(); + } + } + + g_free(tmp); +} + void kvm_arm_pmu_init(CPUState *cs) { struct kvm_device_attr attr = { .group = KVM_ARM_VCPU_PMU_V3_CTRL, .attr = KVM_ARM_VCPU_PMU_V3_INIT, }; + static bool pmu_filter_init = false; if (!ARM_CPU(cs)->has_pmu) { return; } + if (!pmu_filter_init) { + kvm_arm_pmu_filter_init(cs); + pmu_filter_init = true; pmu_filter_init could move inside kvm_arm_pmu_filter_init() - maybe together with a comment that this only needs to be called for 1 vcpu. Good idea. Will do that. Thanks, Shaoqin Thanks, Sebastian
[PATCH v1] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `pmu-filter` as -accel sub-option to set the PMU Event Filtering. The `pmu-filter` has such format: pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" The A means "allow" and D means "deny", start if the first event of the range and the end is the last one. For example: pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is also allowed except the event 0x30 is denied, and all the other events are disallowed. Here is an real example shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm,pmu-filter="D:0x11-0x11" And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events are still work. Signed-off-by: Shaoqin Huang --- include/sysemu/kvm_int.h | 1 + qemu-options.hx | 16 + target/arm/kvm.c | 22 ++ target/arm/kvm64.c | 49 4 files changed, 88 insertions(+) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index fd846394be..8f4601474f 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -120,6 +120,7 @@ struct KVMState uint32_t xen_caps; uint16_t xen_gnttab_max_frames; uint16_t xen_evtchn_max_pirq; +char *kvm_pmu_filter; }; void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..dd3518092c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, "tb-size=n (TCG translation block cache size)\n" "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" "eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" +"pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n" "notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" "thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL) SRST @@ -259,6 +260,21 @@ SRST impact on the memory. By default, this feature is disabled (eager-split-size=0). +``pmu-filter={A,D}:start-end[;...]`` +KVM implements pmu event filtering to prevent a guest from being able to + sample certain events. It has the following format: + + pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start if the first event of the + range and the end is the last one. For example: + + pmu-filter="A:0x11-0x11;A:0x23-0x3a,D:0x30-0x30" + + This will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is + also allowed except the event 0x30 is denied, and all the other events + are disallowed. + ``notify-vmexit=run|internal-error|disable,notify-window=n`` Enables or disables notify VM exit support on x86 host and specify the corresponding notify window to trigger the VM exit if enabled. diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 7903e2ddde..74796de055 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -1108,6 +1108,21 @@ static void kvm_arch_set_eager_split_size(Object *obj, Visitor *v, s->kvm_eager_split_size = value; } +static char *kvm_arch_get_pmu_filter(Object *obj, Error **errp) +{ +KVMState *s = KVM_STATE(obj); + +return g_strdup(s->kvm_pmu_filter); +} + +static void kvm_arch_set_pmu_filter(Object *obj, const char *pmu_filter, +Error **errp) +{ +KVMState *s = KVM_STATE(obj); + +s->kvm_pmu_filter = g_strdup(pmu_filter); +} + void kvm_arch_accel_class_init(ObjectClass *oc) { object_class_pro
Re: [PATCH V6 0/9] Add architecture agnostic code to support vCPU Hotplug
On 10/19/23 17:34, Salil Mehta wrote: Hi Shaoqin, From: Shaoqin Huang Sent: Thursday, October 19, 2023 10:05 AM To: Salil Mehta ; qemu-devel@nongnu.org; qemu- a...@nongnu.org Cc: m...@kernel.org; jean-phili...@linaro.org; Jonathan Cameron ; lpieral...@kernel.org; peter.mayd...@linaro.org; richard.hender...@linaro.org; imamm...@redhat.com; andrew.jo...@linux.dev; da...@redhat.com; phi...@linaro.org; eric.au...@redhat.com; oliver.up...@linux.dev; pbonz...@redhat.com; m...@redhat.com; w...@kernel.org; gs...@redhat.com; raf...@kernel.org; alex.ben...@linaro.org; li...@armlinux.org.uk; dar...@os.amperecomputing.com; il...@os.amperecomputing.com; vis...@os.amperecomputing.com; karl.heub...@oracle.com; miguel.l...@oracle.com; salil.me...@opnsrc.net; zhukeqian ; wangxiongfeng (C) ; wangyanan (Y) ; jiakern...@gmail.com; maob...@loongson.cn; lixiang...@loongson.cn; Linuxarm Subject: Re: [PATCH V6 0/9] Add architecture agnostic code to support vCPU Hotplug On 10/13/23 18:51, Salil Mehta via wrote: Virtual CPU hotplug support is being added across various architectures[1][3]. This series adds various code bits common across all architectures: 1. vCPU creation and Parking code refactor [Patch 1] 2. Update ACPI GED framework to support vCPU Hotplug [Patch 4,6,7] 3. ACPI CPUs AML code change [Patch 5] 4. Helper functions to support unrealization of CPU objects [Patch 8,9] 5. Misc [Patch 2,3] Repository: [*] https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc- v2.common.v6 Revision History: Patch-set V5 -> V6 1. Addressed Gavin Shan's comments - Fixed the assert() ranges of address spaces - Rebased the patch-set to latest changes in the qemu.git - Added Reviewed-by tags for patches {8,9} 2. Addressed Jonathan Cameron's comments - Updated commit-log for [Patch V5 1/9] with mention of trace events - Added Reviewed-by tags for patches {1,5} 3. Added Tested-by tags from Xianglai Li 4. Fixed checkpatch.pl error "Qemu -> QEMU" in [Patch V5 1/9] Link: https://lore.kernel.org/qemu-devel/20231011194355.15628-1- salil.me...@huawei.com/ Patch-set V4 -> V5 1. Addressed Gavin Shan's comments - Fixed the trace events print string for kvm_{create,get,park,destroy}_vcpu - Added Reviewed-by tag for patch {1} 2. Added Shaoqin Huang's Reviewed-by tags for Patches {2,3} 3. Added Tested-by Tag from Vishnu Pajjuri to the patch-set 4. Dropped the ARM specific [Patch V4 10/10] Link: https://lore.kernel.org/qemu-devel/20231009203601.17584-1- salil.me...@huawei.com/ Patch-set V3 -> V4 1. Addressed David Hilderbrand's comments - Fixed the wrong doc comment of kvm_park_vcpu API prototype - Added Reviewed-by tags for patches {2,4} Link: https://lore.kernel.org/qemu-devel/20231009112812.10612-1- salil.me...@huawei.com/ Patch-set V2 -> V3 1. Addressed Jonathan Cameron's comments - Fixed 'vcpu-id' type wrongly changed from 'unsigned long' to 'integer' - Removed unnecessary use of variable 'vcpu_id' in kvm_park_vcpu - Updated [Patch V2 3/10] commit-log with details of ACPI_CPU_SCAN_METHOD macro - Updated [Patch V2 5/10] commit-log with details of conditional event handler method - Added Reviewed-by tags for patches {2,3,4,6,7} 2. Addressed Gavin Shan's comments - Remove unnecessary use of variable 'vcpu_id' in kvm_par_vcpu - Fixed return value in kvm_get_vcpu from -1 to -ENOENT - Reset the value of 'gdb_num_g_regs' in gdb_unregister_coprocessor_all - Fixed the kvm_{create,park}_vcpu prototypes docs - Added Reviewed-by tags for patches {2,3,4,5,6,7,9,10} 3. Addressed one earlier missed comment by Alex Bennée in RFC V1 - Added traces instead of DPRINTF in the newly added and some existing functions Link: https://lore.kernel.org/qemu-devel/20230930001933.2660-1- salil.me...@huawei.com/ Patch-set V1 -> V2 1. Addressed Alex Bennée's comments - Refactored the kvm_create_vcpu logic to get rid of goto - Added the docs for kvm_{create,park}_vcpu prototypes - Splitted the gdbstub and AddressSpace destruction change into separate patches - Added Reviewed-by tags for patches {2,10} Link: https://lore.kernel.org/qemu-devel/20230929124304.13672-1- salil.me...@huawei.com/ References: [1] https://lore.kernel.org/qemu-devel/20230926100436.28284-1- salil.me...@huawei.com/ [2] https://lore.kernel.org/all/20230913163823.7880-1- james.mo...@arm.com/ [3] https://lore.kernel.org/qemu- devel/cover.1695697701.git.lixiang...@loongson.cn/ Salil Mehta (9): accel/kvm: Extract common KVM vCPU {creation,parking} code hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header file hw/acpi: Add ACPI CPU hotplug init stub hw/acpi: Init GED framework with CPU hotplug events hw/acpi: Update CPUs AML with cpu-(ctrl)dev change hw/acpi:
Re: [PATCH V6 0/9] Add architecture agnostic code to support vCPU Hotplug
es changed, 184 insertions(+), 27 deletions(-) Hi salil, All patches looks good to me. Thanks for you effort to update it so actively. No issues being found by simply testing and several daily use. Reviewed-by: Shaoqin Huang Thanks, Shaoqin -- Shaoqin
Re: [PATCH V5 4/9] hw/acpi: Init GED framework with CPU hotplug events
On 10/12/23 03:43, Salil Mehta via wrote: ACPI GED(as described in the ACPI 6.2 spec) can be used to generate ACPI events when OSPM/guest receives an interrupt listed in the _CRS object of GED. OSPM then maps or demultiplexes the event by evaluating _EVT method. This change adds the support of CPU hotplug event initialization in the existing GED framework. Co-developed-by: Keqian Zhu Signed-off-by: Keqian Zhu Signed-off-by: Salil Mehta Reviewed-by: Jonathan Cameron Reviewed-by: Gavin Shan Reviewed-by: David Hildenbrand Tested-by: Vishnu Pajjuri Reviewed-by: Shaoqin Huang --- hw/acpi/generic_event_device.c | 8 include/hw/acpi/generic_event_device.h | 5 + 2 files changed, 13 insertions(+) diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index a3d31631fe..d2fa1d0e4a 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -25,6 +25,7 @@ static const uint32_t ged_supported_events[] = { ACPI_GED_MEM_HOTPLUG_EVT, ACPI_GED_PWR_DOWN_EVT, ACPI_GED_NVDIMM_HOTPLUG_EVT, +ACPI_GED_CPU_HOTPLUG_EVT, }; /* @@ -400,6 +401,13 @@ static void acpi_ged_initfn(Object *obj) memory_region_init_io(&ged_st->regs, obj, &ged_regs_ops, ged_st, TYPE_ACPI_GED "-regs", ACPI_GED_REG_COUNT); sysbus_init_mmio(sbd, &ged_st->regs); + +s->cpuhp.device = OBJECT(s); +memory_region_init(&s->container_cpuhp, OBJECT(dev), "cpuhp container", + ACPI_CPU_HOTPLUG_REG_LEN); +sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->container_cpuhp); +cpu_hotplug_hw_init(&s->container_cpuhp, OBJECT(dev), +&s->cpuhp_state, 0); } static void acpi_ged_class_init(ObjectClass *class, void *data) diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h index d831bbd889..d0a5a43abf 100644 --- a/include/hw/acpi/generic_event_device.h +++ b/include/hw/acpi/generic_event_device.h @@ -60,6 +60,7 @@ #define HW_ACPI_GENERIC_EVENT_DEVICE_H #include "hw/sysbus.h" +#include "hw/acpi/cpu_hotplug.h" #include "hw/acpi/memory_hotplug.h" #include "hw/acpi/ghes.h" #include "qom/object.h" @@ -97,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED) #define ACPI_GED_MEM_HOTPLUG_EVT 0x1 #define ACPI_GED_PWR_DOWN_EVT 0x2 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4 +#define ACPI_GED_CPU_HOTPLUG_EVT0x8 typedef struct GEDState { MemoryRegion evt; @@ -108,6 +110,9 @@ struct AcpiGedState { SysBusDevice parent_obj; MemHotplugState memhp_state; MemoryRegion container_memhp; +CPUHotplugState cpuhp_state; +MemoryRegion container_cpuhp; +AcpiCpuHotplug cpuhp; GEDState ged_state; uint32_t ged_event_bitmap; qemu_irq irq; -- Shaoqin
Re: [PATCH V5 1/9] accel/kvm: Extract common KVM vCPU {creation, parking} code
On 10/12/23 03:43, Salil Mehta via wrote: KVM vCPU creation is done once during the initialization of the VM when Qemu thread is spawned. This is common to all the architectures. Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the corresponding KVM vCPU object in the Host KVM is not destroyed and its representative KVM vCPU object/context in Qemu is parked. Refactor common logic so that some APIs could be reused by vCPU Hotplug code. Signed-off-by: Salil Mehta Reviewed-by: Gavin Shan Tested-by: Vishnu Pajjuri Reviewed-by: Shaoqin Huang --- accel/kvm/kvm-all.c| 64 -- accel/kvm/trace-events | 4 +++ include/sysemu/kvm.h | 16 +++ 3 files changed, 69 insertions(+), 15 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index ff1578bb32..0dcaa15276 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -137,6 +137,7 @@ static QemuMutex kml_slots_lock; #define kvm_slots_unlock() qemu_mutex_unlock(&kml_slots_lock) static void kvm_slot_init_dirty_bitmap(KVMSlot *mem); +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id); static inline void kvm_resample_fd_remove(int gsi) { @@ -320,14 +321,53 @@ err: return ret; } +void kvm_park_vcpu(CPUState *cpu) +{ +struct KVMParkedVcpu *vcpu; + +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); + +vcpu = g_malloc0(sizeof(*vcpu)); +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu); +vcpu->kvm_fd = cpu->kvm_fd; +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); +} + +int kvm_create_vcpu(CPUState *cpu) +{ +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu); +KVMState *s = kvm_state; +int kvm_fd; + +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); + +/* check if the KVM vCPU already exist but is parked */ +kvm_fd = kvm_get_vcpu(s, vcpu_id); +if (kvm_fd < 0) { +/* vCPU not parked: create a new KVM vCPU */ +kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id); +if (kvm_fd < 0) { +error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id); +return kvm_fd; +} +} + +cpu->kvm_fd = kvm_fd; +cpu->kvm_state = s; +cpu->vcpu_dirty = true; +cpu->dirty_pages = 0; +cpu->throttle_us_per_full = 0; + +return 0; +} + static int do_kvm_destroy_vcpu(CPUState *cpu) { KVMState *s = kvm_state; long mmap_size; -struct KVMParkedVcpu *vcpu = NULL; int ret = 0; -DPRINTF("kvm_destroy_vcpu\n"); +trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); ret = kvm_arch_destroy_vcpu(cpu); if (ret < 0) { @@ -353,10 +393,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu) } } -vcpu = g_malloc0(sizeof(*vcpu)); -vcpu->vcpu_id = kvm_arch_vcpu_id(cpu); -vcpu->kvm_fd = cpu->kvm_fd; -QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); +kvm_park_vcpu(cpu); err: return ret; } @@ -377,6 +414,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id) if (cpu->vcpu_id == vcpu_id) { int kvm_fd; +trace_kvm_get_vcpu(vcpu_id); + QLIST_REMOVE(cpu, node); kvm_fd = cpu->kvm_fd; g_free(cpu); @@ -384,7 +423,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id) } } -return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id); +return -ENOENT; } int kvm_init_vcpu(CPUState *cpu, Error **errp) @@ -395,19 +434,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp) trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); -ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu)); +ret = kvm_create_vcpu(cpu); if (ret < 0) { -error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)", +error_setg_errno(errp, -ret, + "kvm_init_vcpu: kvm_create_vcpu failed (%lu)", kvm_arch_vcpu_id(cpu)); goto err; } -cpu->kvm_fd = ret; -cpu->kvm_state = s; -cpu->vcpu_dirty = true; -cpu->dirty_pages = 0; -cpu->throttle_us_per_full = 0; - mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0); if (mmap_size < 0) { ret = mmap_size; diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events index 399aaeb0ec..cdd0c95c09 100644 --- a/accel/kvm/trace-events +++ b/accel/kvm/trace-events @@ -9,6 +9,10 @@ kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, type 0x%x, arg %p" kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve ONEREG %" PRIu64 " from KVM: %s" kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set ONEREG %"
Re: [PATCH V5 1/9] accel/kvm: Extract common KVM vCPU {creation, parking} code
On 10/12/23 03:43, Salil Mehta via wrote: KVM vCPU creation is done once during the initialization of the VM when Qemu thread is spawned. This is common to all the architectures. Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the corresponding KVM vCPU object in the Host KVM is not destroyed and its representative KVM vCPU object/context in Qemu is parked. Refactor common logic so that some APIs could be reused by vCPU Hotplug code. Signed-off-by: Salil Mehta Reviewed-by: Gavin Shan Tested-by: Vishnu Pajjuri Reviewed-by: Shaoqin Huang --- accel/kvm/kvm-all.c| 64 -- accel/kvm/trace-events | 4 +++ include/sysemu/kvm.h | 16 +++ 3 files changed, 69 insertions(+), 15 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index ff1578bb32..0dcaa15276 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -137,6 +137,7 @@ static QemuMutex kml_slots_lock; #define kvm_slots_unlock() qemu_mutex_unlock(&kml_slots_lock) static void kvm_slot_init_dirty_bitmap(KVMSlot *mem); +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id); static inline void kvm_resample_fd_remove(int gsi) { @@ -320,14 +321,53 @@ err: return ret; } +void kvm_park_vcpu(CPUState *cpu) +{ +struct KVMParkedVcpu *vcpu; + +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); + +vcpu = g_malloc0(sizeof(*vcpu)); +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu); +vcpu->kvm_fd = cpu->kvm_fd; +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); +} + +int kvm_create_vcpu(CPUState *cpu) +{ +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu); +KVMState *s = kvm_state; +int kvm_fd; + +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); + +/* check if the KVM vCPU already exist but is parked */ +kvm_fd = kvm_get_vcpu(s, vcpu_id); +if (kvm_fd < 0) { +/* vCPU not parked: create a new KVM vCPU */ +kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id); +if (kvm_fd < 0) { +error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id); +return kvm_fd; +} +} + +cpu->kvm_fd = kvm_fd; +cpu->kvm_state = s; +cpu->vcpu_dirty = true; +cpu->dirty_pages = 0; +cpu->throttle_us_per_full = 0; + +return 0; +} + static int do_kvm_destroy_vcpu(CPUState *cpu) { KVMState *s = kvm_state; long mmap_size; -struct KVMParkedVcpu *vcpu = NULL; int ret = 0; -DPRINTF("kvm_destroy_vcpu\n"); +trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); ret = kvm_arch_destroy_vcpu(cpu); if (ret < 0) { @@ -353,10 +393,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu) } } -vcpu = g_malloc0(sizeof(*vcpu)); -vcpu->vcpu_id = kvm_arch_vcpu_id(cpu); -vcpu->kvm_fd = cpu->kvm_fd; -QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); +kvm_park_vcpu(cpu); err: return ret; } @@ -377,6 +414,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id) if (cpu->vcpu_id == vcpu_id) { int kvm_fd; +trace_kvm_get_vcpu(vcpu_id); + QLIST_REMOVE(cpu, node); kvm_fd = cpu->kvm_fd; g_free(cpu); @@ -384,7 +423,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id) } } -return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id); +return -ENOENT; } int kvm_init_vcpu(CPUState *cpu, Error **errp) @@ -395,19 +434,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp) trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); -ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu)); +ret = kvm_create_vcpu(cpu); if (ret < 0) { -error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)", +error_setg_errno(errp, -ret, + "kvm_init_vcpu: kvm_create_vcpu failed (%lu)", kvm_arch_vcpu_id(cpu)); goto err; } -cpu->kvm_fd = ret; -cpu->kvm_state = s; -cpu->vcpu_dirty = true; -cpu->dirty_pages = 0; -cpu->throttle_us_per_full = 0; - mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0); if (mmap_size < 0) { ret = mmap_size; diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events index 399aaeb0ec..cdd0c95c09 100644 --- a/accel/kvm/trace-events +++ b/accel/kvm/trace-events @@ -9,6 +9,10 @@ kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, type 0x%x, arg %p" kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve ONEREG %" PRIu64 " from KVM: %s" kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set ONEREG %"
Re: [PATCH V4 07/10] hw/acpi: Update ACPI GED framework to support vCPU Hotplug
On 10/10/23 04:35, Salil Mehta via wrote: ACPI GED shall be used to convey to the guest kernel about any CPU hot-(un)plug events. Therefore, existing ACPI GED framework inside QEMU needs to be enhanced to support CPU hotplug state and events. Co-developed-by: Keqian Zhu Signed-off-by: Keqian Zhu Signed-off-by: Salil Mehta Reviewed-by: Jonathan Cameron Reviewed-by: Gavin Shan Reviewed-by: Shaoqin Huang --- hw/acpi/generic_event_device.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index 62d504d231..0d5f0140e5 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "hw/acpi/acpi.h" +#include "hw/acpi/cpu.h" #include "hw/acpi/generic_event_device.h" #include "hw/irq.h" #include "hw/mem/pc-dimm.h" @@ -239,6 +240,8 @@ static void acpi_ged_device_plug_cb(HotplugHandler *hotplug_dev, } else { acpi_memory_plug_cb(hotplug_dev, &s->memhp_state, dev, errp); } +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_plug_cb(hotplug_dev, &s->cpuhp_state, dev, errp); } else { error_setg(errp, "virt: device plug request for unsupported device" " type: %s", object_get_typename(OBJECT(dev))); @@ -253,6 +256,8 @@ static void acpi_ged_unplug_request_cb(HotplugHandler *hotplug_dev, if ((object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) && !(object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM { acpi_memory_unplug_request_cb(hotplug_dev, &s->memhp_state, dev, errp); +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_unplug_request_cb(hotplug_dev, &s->cpuhp_state, dev, errp); } else { error_setg(errp, "acpi: device unplug request for unsupported device" " type: %s", object_get_typename(OBJECT(dev))); @@ -266,6 +271,8 @@ static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev, if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { acpi_memory_unplug_cb(&s->memhp_state, dev, errp); +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_unplug_cb(&s->cpuhp_state, dev, errp); } else { error_setg(errp, "acpi: device unplug for unsupported device" " type: %s", object_get_typename(OBJECT(dev))); @@ -277,6 +284,7 @@ static void acpi_ged_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list) AcpiGedState *s = ACPI_GED(adev); acpi_memory_ospm_status(&s->memhp_state, list); +acpi_cpu_ospm_status(&s->cpuhp_state, list); } static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev) @@ -291,6 +299,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev) sel = ACPI_GED_PWR_DOWN_EVT; } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) { sel = ACPI_GED_NVDIMM_HOTPLUG_EVT; +} else if (ev & ACPI_CPU_HOTPLUG_STATUS) { +sel = ACPI_GED_CPU_HOTPLUG_EVT; } else { /* Unknown event. Return without generating interrupt. */ warn_report("GED: Unsupported event %d. No irq injected", ev); -- Shaoqin
Re: [PATCH V4 03/10] hw/acpi: Add ACPI CPU hotplug init stub
On 10/10/23 04:35, Salil Mehta via wrote: ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG support has been enabled for particular architecture. Add cpu_hotplug_hw_init() stub to avoid compilation break. Signed-off-by: Salil Mehta Reviewed-by: Jonathan Cameron Reviewed-by: Gavin Shan Reviewed-by: Shaoqin Huang --- hw/acpi/acpi-cpu-hotplug-stub.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/acpi/acpi-cpu-hotplug-stub.c b/hw/acpi/acpi-cpu-hotplug-stub.c index 3fc4b14c26..c6c61bb9cd 100644 --- a/hw/acpi/acpi-cpu-hotplug-stub.c +++ b/hw/acpi/acpi-cpu-hotplug-stub.c @@ -19,6 +19,12 @@ void legacy_acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner, return; } +void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner, + CPUHotplugState *state, hwaddr base_addr) +{ +return; +} + void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list) { return; -- Shaoqin
Re: [PATCH V4 02/10] hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header file
On 10/10/23 04:35, Salil Mehta via wrote: CPU ctrl-dev MMIO region length could be used in ACPI GED and various other architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more appropriate common header file. Signed-off-by: Salil Mehta Reviewed-by: Alex Bennée Reviewed-by: Jonathan Cameron Reviewed-by: Gavin Shan Reviewed-by: David Hildenbrand Reviewed-by: Shaoqin Huang --- hw/acpi/cpu.c | 2 +- include/hw/acpi/cpu_hotplug.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c index 19c154d78f..45defdc0e2 100644 --- a/hw/acpi/cpu.c +++ b/hw/acpi/cpu.c @@ -1,12 +1,12 @@ #include "qemu/osdep.h" #include "migration/vmstate.h" #include "hw/acpi/cpu.h" +#include "hw/acpi/cpu_hotplug.h" #include "qapi/error.h" #include "qapi/qapi-events-acpi.h" #include "trace.h" #include "sysemu/numa.h" -#define ACPI_CPU_HOTPLUG_REG_LEN 12 #define ACPI_CPU_SELECTOR_OFFSET_WR 0 #define ACPI_CPU_FLAGS_OFFSET_RW 4 #define ACPI_CPU_CMD_OFFSET_WR 5 diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h index 3b932a..48b291e45e 100644 --- a/include/hw/acpi/cpu_hotplug.h +++ b/include/hw/acpi/cpu_hotplug.h @@ -19,6 +19,8 @@ #include "hw/hotplug.h" #include "hw/acpi/cpu.h" +#define ACPI_CPU_HOTPLUG_REG_LEN 12 + typedef struct AcpiCpuHotplug { Object *device; MemoryRegion io; -- Shaoqin
Re: [PATCH RFC V2 03/37] hw/arm/virt: Move setting of common CPU properties in a function
On 9/26/23 18:04, Salil Mehta via wrote: Factor out CPU properties code common for {hot,cold}-plugged CPUs. This allows code reuse. Signed-off-by: Salil Mehta --- hw/arm/virt.c | 220 ++ include/hw/arm/virt.h | 4 + 2 files changed, 140 insertions(+), 84 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 57fe97c242..0eb6bf5a18 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2018,16 +2018,130 @@ static void virt_cpu_post_init(VirtMachineState *vms, MemoryRegion *sysmem) } } +static void virt_cpu_set_properties(Object *cpuobj, const CPUArchId *cpu_slot, +Error **errp) +{ Hi Salil, This patch seems break the code, the virt_cpu_set_properties() function being defined but not used in this patch, so those original code in the machvirt_init() just not work. We should use this function in the machvirt_init(). +MachineState *ms = MACHINE(qdev_get_machine()); +VirtMachineState *vms = VIRT_MACHINE(ms); +Error *local_err = NULL; +VirtMachineClass *vmc; + +vmc = VIRT_MACHINE_GET_CLASS(ms); + +/* now, set the cpu object property values */ +numa_cpu_pre_plug(cpu_slot, DEVICE(cpuobj), &local_err); +if (local_err) { +goto out; +} + +object_property_set_int(cpuobj, "mp-affinity", cpu_slot->arch_id, NULL); + +if (!vms->secure) { +object_property_set_bool(cpuobj, "has_el3", false, NULL); +} + +if (!vms->virt && object_property_find(cpuobj, "has_el2")) { +object_property_set_bool(cpuobj, "has_el2", false, NULL); +} + +if (vmc->kvm_no_adjvtime && +object_property_find(cpuobj, "kvm-no-adjvtime")) { +object_property_set_bool(cpuobj, "kvm-no-adjvtime", true, NULL); +} + +if (vmc->no_kvm_steal_time && +object_property_find(cpuobj, "kvm-steal-time")) { +object_property_set_bool(cpuobj, "kvm-steal-time", false, NULL); +} + +if (vmc->no_pmu && object_property_find(cpuobj, "pmu")) { +object_property_set_bool(cpuobj, "pmu", false, NULL); +} + +if (vmc->no_tcg_lpa2 && object_property_find(cpuobj, "lpa2")) { +object_property_set_bool(cpuobj, "lpa2", false, NULL); +} + +if (object_property_find(cpuobj, "reset-cbar")) { +object_property_set_int(cpuobj, "reset-cbar", +vms->memmap[VIRT_CPUPERIPHS].base, +&local_err); +if (local_err) { +goto out; +} +} + +/* link already initialized {secure,tag}-memory regions to this cpu */ +object_property_set_link(cpuobj, "memory", OBJECT(vms->sysmem), &local_err); +if (local_err) { +goto out; +} + +if (vms->secure) { +object_property_set_link(cpuobj, "secure-memory", + OBJECT(vms->secure_sysmem), &local_err); +if (local_err) { +goto out; +} +} + +if (vms->mte) { +if (!object_property_find(cpuobj, "tag-memory")) { +error_setg(&local_err, "MTE requested, but not supported " + "by the guest CPU"); +if (local_err) { +goto out; +} +} + +object_property_set_link(cpuobj, "tag-memory", OBJECT(vms->tag_sysmem), + &local_err); +if (local_err) { +goto out; +} + +if (vms->secure) { +object_property_set_link(cpuobj, "secure-tag-memory", + OBJECT(vms->secure_tag_sysmem), + &local_err); +if (local_err) { +goto out; +} +} +} + +/* + * RFC: Question: this must only be called for the hotplugged cpus. For the + * cold booted secondary cpus this is being taken care in arm_load_kernel() + * in boot.c. Perhaps we should remove that code now? + */ +if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) { +object_property_set_int(cpuobj, "psci-conduit", vms->psci_conduit, +NULL); + +/* Secondary CPUs start in PSCI powered-down state */ +if (CPU(cpuobj)->cpu_index > 0) { +object_property_set_bool(cpuobj, "start-powered-off", true, NULL); +} +} Besides, if this patch is just factor out the code, we could move the check psci_conduit to later patch, and keep this patch clean. Thanks, Shaoqin + +out: +if (local_err) { +error_propagate(errp, local_err); +} +return; +} + static void machvirt_init(MachineState *machine) { VirtMachineState *vms = VIRT_MACHINE(machine); VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine); MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *possible_cpus; -MemoryRegion *sysmem = get_system_memory(); +MemoryRegion *secure_tag_sysmem
Re: [PATCH v1 0/5] target/arm: Handle psci calls in userspace
Hi Salil, On 6/26/23 21:42, Salil Mehta wrote: From: Shaoqin Huang Sent: Monday, June 26, 2023 7:49 AM To: qemu-devel@nongnu.org; qemu-...@nongnu.org Cc: oliver.up...@linux.dev; Salil Mehta ; james.mo...@arm.com; gs...@redhat.com; Shaoqin Huang ; Cornelia Huck ; k...@vger.kernel.org; Michael S. Tsirkin ; Paolo Bonzini ; Peter Maydell Subject: [PATCH v1 0/5] target/arm: Handle psci calls in userspace The userspace SMCCC call filtering[1] provides the ability to forward the SMCCC calls to the userspace. The vCPU hotplug[2] would be the first legitimate use case to handle the psci calls in userspace, thus the vCPU hotplug can deny the PSCI_ON call if the vCPU is not present now. This series try to enable the userspace SMCCC call filtering, thus can handle the SMCCC call in userspace. The first enabled SMCCC call is psci call, by using the new added option 'user-smccc', we can enable handle psci calls in userspace. qemu-system-aarch64 -machine virt,user-smccc=on This series reuse the qemu implementation of the psci handling, thus the handling process is very simple. But when handling psci in userspace when using kvm, the reset vcpu process need to be taking care, the detail is included in the patch05. This change in intended for VCPU Hotplug and we are duplicating the code we are working on. Unless this change is also intended for any other feature I would request you to defer this. Thanks for sharing me the information. I'm not intended for merging this series, but discuss something about the VCPU Hotplug, since I'm also following the work of vCPU Hotplug. Just curious, what is your plan to update a new version of VCPU Hotplug which is based on the userspace SMCCC filtering? Thanks, Shaoqin Thanks Salil -- Shaoqin
[PATCH v1 5/5] arm/kvm: add support for userspace psci calls handling
Use the SMCCC filter to start sending psci calls to userspace, qemu will need to handle the psci calls. In qemu, reuse the psci handler which used for tcg, while use it, we need to take care the reset vcpu process which will reset the vcpu register and grab all vcpu locks when reset gicv3. So when reset vcpu, we need to mark it as dirty to force the vcpu to sync its register to kvm, and when reset gicv3, we need to pause all vcpus to grab the all vcpu locks, thus when handling the psci CPU_ON call, the vcpu can be successfuly boot up. Signed-off-by: Shaoqin Huang --- hw/intc/arm_gicv3_kvm.c | 10 + target/arm/kvm.c| 94 - 2 files changed, 103 insertions(+), 1 deletion(-) diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c index 72ad916d3d..e42898c1d6 100644 --- a/hw/intc/arm_gicv3_kvm.c +++ b/hw/intc/arm_gicv3_kvm.c @@ -24,6 +24,7 @@ #include "hw/intc/arm_gicv3_common.h" #include "qemu/error-report.h" #include "qemu/module.h" +#include "sysemu/cpus.h" #include "sysemu/kvm.h" #include "sysemu/runstate.h" #include "kvm_arm.h" @@ -695,10 +696,19 @@ static void arm_gicv3_icc_reset(CPUARMState *env, const ARMCPRegInfo *ri) return; } +/* + * When handling psci call in userspace like cpu hotplug, this shall be called + * when other vcpus might be running. Host kernel KVM to handle device + * access of IOCTLs KVM_{GET|SET}_DEVICE_ATTR might fail due to inability to + * grab vcpu locks for all the vcpus. Hence, we need to pause all vcpus to + * facilitate locking within host. + */ +pause_all_vcpus(); /* Initialize to actual HW supported configuration */ kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS, KVM_VGIC_ATTR(ICC_CTLR_EL1, c->gicr_typer), &c->icc_ctlr_el1[GICV3_NS], false, &error_abort); +resume_all_vcpus(); c->icc_ctlr_el1[GICV3_S] = c->icc_ctlr_el1[GICV3_NS]; } diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 579c6edd49..d2857a8499 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -10,6 +10,7 @@ #include "qemu/osdep.h" #include +#include #include #include @@ -251,7 +252,29 @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa) static int kvm_arm_init_smccc_filter(KVMState *s) { +unsigned int i; int ret = 0; +struct kvm_smccc_filter filter_ranges[] = { +{ +.base = KVM_PSCI_FN_BASE, +.nr_functions = 4, +.action = KVM_SMCCC_FILTER_DENY, +}, +{ +.base = PSCI_0_2_FN_BASE, +.nr_functions = 0x20, +.action = KVM_SMCCC_FILTER_FWD_TO_USER, +}, +{ +.base = PSCI_0_2_FN64_BASE, +.nr_functions = 0x20, +.action = KVM_SMCCC_FILTER_FWD_TO_USER, +}, +}; +struct kvm_device_attr attr = { +.group = KVM_ARM_VM_SMCCC_CTRL, +.attr = KVM_ARM_VM_SMCCC_FILTER, +}; if (kvm_vm_check_attr(s, KVM_ARM_VM_SMCCC_CTRL, KVM_ARM_VM_SMCCC_FILTER)) { error_report("ARM SMCCC filter not supported"); @@ -259,6 +282,16 @@ static int kvm_arm_init_smccc_filter(KVMState *s) goto out; } +for (i = 0; i < ARRAY_SIZE(filter_ranges); i++) { +attr.addr = (uint64_t)&filter_ranges[i]; + +ret = kvm_vm_ioctl(s, KVM_SET_DEVICE_ATTR, &attr); +if (ret < 0) { +error_report("KVM_SET_DEVICE_ATTR failed when SMCCC init"); +goto out; +} +} + out: return ret; } @@ -654,6 +687,14 @@ void kvm_arm_reset_vcpu(ARMCPU *cpu) * for the same reason we do so in kvm_arch_get_registers(). */ write_list_to_cpustate(cpu); + +/* + * When enabled userspace psci call handling, qemu will reset the vcpu if + * it's PSCI CPU_ON call. Since this will reset the vcpu register and + * power_state, we should sync these state to kvm, so manually set the + * vcpu_dirty to force the qemu to put register to kvm. + */ +CPU(cpu)->vcpu_dirty = true; } /* @@ -932,6 +973,51 @@ static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss, return -1; } +static int kvm_arm_handle_psci(CPUState *cs, struct kvm_run *run) +{ +if (run->hypercall.flags & KVM_HYPERCALL_EXIT_SMC) { +cs->exception_index = EXCP_SMC; +} else { +cs->exception_index = EXCP_HVC; +} + +qemu_mutex_lock_iothread(); +arm_cpu_do_interrupt(cs); +qemu_mutex_unlock_iothread(); + +/* + * We need to exit the run loop to have the chance to execute the + * qemu_wait_io_event() which will execute the psci function which queued in + * the cpu work queue. + */ +re
[PATCH v1 3/5] target/arm: make psci call can be used by kvm
Now the psci call can only be used when tcg_enabled, we want to reuse it when kvm_enabled, which will be used in subsequent patch which enable the psci handling in userspace. Signed-off-by: Shaoqin Huang --- target/arm/helper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index d4bee43bd0..58063a92a6 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -11020,7 +11020,8 @@ void arm_cpu_do_interrupt(CPUState *cs) env->exception.syndrome); } -if (tcg_enabled() && arm_is_psci_call(cpu, cs->exception_index)) { +if ((tcg_enabled() || kvm_enabled()) && + arm_is_psci_call(cpu, cs->exception_index)) { arm_handle_psci_call(cpu); qemu_log_mask(CPU_LOG_INT, "...handled as PSCI call\n"); return; -- 2.39.1
[PATCH v1 2/5] linux-headers: Import arm-smccc.h from Linux v6.4-rc7
Copy in the SMCCC definitions from the kernel, which will be used to implement SMCCC handling in userspace. Signed-off-by: Shaoqin Huang --- linux-headers/linux/arm-smccc.h | 240 1 file changed, 240 insertions(+) create mode 100644 linux-headers/linux/arm-smccc.h diff --git a/linux-headers/linux/arm-smccc.h b/linux-headers/linux/arm-smccc.h new file mode 100644 index 00..3663c31ba5 --- /dev/null +++ b/linux-headers/linux/arm-smccc.h @@ -0,0 +1,240 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2015, Linaro Limited + */ +#ifndef __LINUX_ARM_SMCCC_H +#define __LINUX_ARM_SMCCC_H + +#include + +/* + * This file provides common defines for ARM SMC Calling Convention as + * specified in + * https://developer.arm.com/docs/den0028/latest + * + * This code is up-to-date with version DEN 0028 C + */ + +#define ARM_SMCCC_STD_CALL _AC(0,U) +#define ARM_SMCCC_FAST_CALL_AC(1,U) +#define ARM_SMCCC_TYPE_SHIFT 31 + +#define ARM_SMCCC_SMC_32 0 +#define ARM_SMCCC_SMC_64 1 +#define ARM_SMCCC_CALL_CONV_SHIFT 30 + +#define ARM_SMCCC_OWNER_MASK 0x3F +#define ARM_SMCCC_OWNER_SHIFT 24 + +#define ARM_SMCCC_FUNC_MASK0x + +#define ARM_SMCCC_IS_FAST_CALL(smc_val)\ + ((smc_val) & (ARM_SMCCC_FAST_CALL << ARM_SMCCC_TYPE_SHIFT)) +#define ARM_SMCCC_IS_64(smc_val) \ + ((smc_val) & (ARM_SMCCC_SMC_64 << ARM_SMCCC_CALL_CONV_SHIFT)) +#define ARM_SMCCC_FUNC_NUM(smc_val)((smc_val) & ARM_SMCCC_FUNC_MASK) +#define ARM_SMCCC_OWNER_NUM(smc_val) \ + (((smc_val) >> ARM_SMCCC_OWNER_SHIFT) & ARM_SMCCC_OWNER_MASK) + +#define ARM_SMCCC_CALL_VAL(type, calling_convention, owner, func_num) \ + (((type) << ARM_SMCCC_TYPE_SHIFT) | \ + ((calling_convention) << ARM_SMCCC_CALL_CONV_SHIFT) | \ + (((owner) & ARM_SMCCC_OWNER_MASK) << ARM_SMCCC_OWNER_SHIFT) | \ + ((func_num) & ARM_SMCCC_FUNC_MASK)) + +#define ARM_SMCCC_OWNER_ARCH 0 +#define ARM_SMCCC_OWNER_CPU1 +#define ARM_SMCCC_OWNER_SIP2 +#define ARM_SMCCC_OWNER_OEM3 +#define ARM_SMCCC_OWNER_STANDARD 4 +#define ARM_SMCCC_OWNER_STANDARD_HYP 5 +#define ARM_SMCCC_OWNER_VENDOR_HYP 6 +#define ARM_SMCCC_OWNER_TRUSTED_APP48 +#define ARM_SMCCC_OWNER_TRUSTED_APP_END49 +#define ARM_SMCCC_OWNER_TRUSTED_OS 50 +#define ARM_SMCCC_OWNER_TRUSTED_OS_END 63 + +#define ARM_SMCCC_FUNC_QUERY_CALL_UID 0xff01 + +#define ARM_SMCCC_QUIRK_NONE 0 +#define ARM_SMCCC_QUIRK_QCOM_A61 /* Save/restore register a6 */ + +#define ARM_SMCCC_VERSION_1_0 0x1 +#define ARM_SMCCC_VERSION_1_1 0x10001 +#define ARM_SMCCC_VERSION_1_2 0x10002 +#define ARM_SMCCC_VERSION_1_3 0x10003 + +#define ARM_SMCCC_1_3_SVE_HINT 0x1 + +#define ARM_SMCCC_VERSION_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 0) + +#define ARM_SMCCC_ARCH_FEATURES_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 1) + +#define ARM_SMCCC_ARCH_SOC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 2) + +#define ARM_SMCCC_ARCH_WORKAROUND_1\ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 0x8000) + +#define ARM_SMCCC_ARCH_WORKAROUND_2\ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 0x7fff) + +#define ARM_SMCCC_ARCH_WORKAROUND_3\ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 0x3fff) + +#define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + ARM_SMCCC_OWNER_VENDOR_HYP, \ + ARM_SMCCC_FUNC_QUERY_CALL_UID) + +/* KVM UID value: 28b46fb6-2ec5-11e9-a9ca-4b564d003a74 */ +#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 0xb66fb428U
[PATCH v1 1/5] linux-headers: Update to v6.4-rc7
Update to commit 45a3e24f65e9 ("Linux 6.4-rc7"). Signed-off-by: Shaoqin Huang --- include/standard-headers/linux/const.h| 2 +- include/standard-headers/linux/virtio_blk.h | 18 +++ .../standard-headers/linux/virtio_config.h| 6 +++ include/standard-headers/linux/virtio_net.h | 1 + linux-headers/asm-arm64/kvm.h | 33 linux-headers/asm-riscv/kvm.h | 53 ++- linux-headers/asm-riscv/unistd.h | 9 linux-headers/asm-s390/unistd_32.h| 1 + linux-headers/asm-s390/unistd_64.h| 1 + linux-headers/asm-x86/kvm.h | 3 ++ linux-headers/linux/const.h | 2 +- linux-headers/linux/kvm.h | 12 +++-- linux-headers/linux/psp-sev.h | 7 +++ linux-headers/linux/userfaultfd.h | 17 +- 14 files changed, 149 insertions(+), 16 deletions(-) diff --git a/include/standard-headers/linux/const.h b/include/standard-headers/linux/const.h index 5e48987251..1eb84b5087 100644 --- a/include/standard-headers/linux/const.h +++ b/include/standard-headers/linux/const.h @@ -28,7 +28,7 @@ #define _BITUL(x) (_UL(1) << (x)) #define _BITULL(x) (_ULL(1) << (x)) -#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1) +#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (__typeof__(x))(a) - 1) #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h index 7155b1a470..d7be3cf5e4 100644 --- a/include/standard-headers/linux/virtio_blk.h +++ b/include/standard-headers/linux/virtio_blk.h @@ -138,11 +138,11 @@ struct virtio_blk_config { /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ struct virtio_blk_zoned_characteristics { - uint32_t zone_sectors; - uint32_t max_open_zones; - uint32_t max_active_zones; - uint32_t max_append_sectors; - uint32_t write_granularity; + __virtio32 zone_sectors; + __virtio32 max_open_zones; + __virtio32 max_active_zones; + __virtio32 max_append_sectors; + __virtio32 write_granularity; uint8_t model; uint8_t unused2[3]; } zoned; @@ -239,11 +239,11 @@ struct virtio_blk_outhdr { */ struct virtio_blk_zone_descriptor { /* Zone capacity */ - uint64_t z_cap; + __virtio64 z_cap; /* The starting sector of the zone */ - uint64_t z_start; + __virtio64 z_start; /* Zone write pointer position in sectors */ - uint64_t z_wp; + __virtio64 z_wp; /* Zone type */ uint8_t z_type; /* Zone state */ @@ -252,7 +252,7 @@ struct virtio_blk_zone_descriptor { }; struct virtio_blk_zone_report { - uint64_t nr_zones; + __virtio64 nr_zones; uint8_t reserved[56]; struct virtio_blk_zone_descriptor zones[]; }; diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h index 965ee6ae23..8a7d0dc8b0 100644 --- a/include/standard-headers/linux/virtio_config.h +++ b/include/standard-headers/linux/virtio_config.h @@ -97,6 +97,12 @@ */ #define VIRTIO_F_SR_IOV37 +/* + * This feature indicates that the driver passes extra data (besides + * identifying the virtqueue) in its device notifications. + */ +#define VIRTIO_F_NOTIFICATION_DATA 38 + /* * This feature indicates that the driver can reset a queue individually. */ diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h index c0e797067a..2325485f2c 100644 --- a/include/standard-headers/linux/virtio_net.h +++ b/include/standard-headers/linux/virtio_net.h @@ -61,6 +61,7 @@ #define VIRTIO_NET_F_GUEST_USO655 /* Guest can handle USOv6 in. */ #define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO in. */ #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */ +#define VIRTIO_NET_F_GUEST_HDRLEN 59 /* Guest provides the exact hdr_len value. */ #define VIRTIO_NET_F_RSS 60/* Supports RSS RX steering */ #define VIRTIO_NET_F_RSC_EXT 61/* extended coalescing info */ #define VIRTIO_NET_F_STANDBY 62/* Act as standby for another device diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h index d7e7bb885e..38e5957526 100644 --- a/linux-headers/asm-arm64/kvm.h +++ b/linux-headers/asm-arm64/kvm.h @@ -198,6 +198,15 @@ struct kvm_arm_copy_mte_tags { __u64 reserved[2]; }; +/* + * Counter/Timer offset structure. Describe the virtual/physical offset. + * To be used with KVM_ARM_SET_COUNTER_OFFSET. + */ +str
[PATCH v1 4/5] arm/kvm: add skeleton implementation for userspace SMCCC call handling
The SMCCC call filtering provide the ability to forward the SMCCC call to userspace, so we provide a new option `user-smccc` to enable handling SMCCC call in userspace, the default value is off. And add the skeleton implementation for userspace SMCCC call initialization and handling. Signed-off-by: Shaoqin Huang --- docs/system/arm/virt.rst | 4 +++ hw/arm/virt.c| 21 include/hw/arm/virt.h| 1 + target/arm/kvm.c | 54 4 files changed, 80 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 1cab33f02e..ff43d52f04 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -155,6 +155,10 @@ dtb-randomness DTB to be non-deterministic. It would be the responsibility of the firmware to come up with a seed and pass it on if it wants to. +user-smccc + Set ``on``/``off`` to enable/disable handling smccc call in userspace + instead of kernel. + dtb-kaslr-seed A deprecated synonym for dtb-randomness. diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 9b9f7d9c68..767720321c 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -42,6 +42,7 @@ #include "hw/vfio/vfio-amd-xgbe.h" #include "hw/display/ramfb.h" #include "net/net.h" +#include "qom/object.h" #include "sysemu/device_tree.h" #include "sysemu/numa.h" #include "sysemu/runstate.h" @@ -2511,6 +2512,19 @@ static void virt_set_oem_table_id(Object *obj, const char *value, strncpy(vms->oem_table_id, value, 8); } +static bool virt_get_user_smccc(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->user_smccc; +} + +static void virt_set_user_smccc(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->user_smccc = value; +} bool virt_is_acpi_enabled(VirtMachineState *vms) { @@ -3155,6 +3169,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "in ACPI table header." "The string may be up to 8 bytes in size"); +object_class_property_add_bool(oc, "user-smccc", + virt_get_user_smccc, + virt_set_user_smccc); +object_class_property_set_description(oc, "user-smccc", + "Set on/off to enable/disable " + "handling smccc call in userspace"); + } static void virt_instance_init(Object *obj) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index e1ddbea96b..4f1bc12680 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -160,6 +160,7 @@ struct VirtMachineState { bool ras; bool mte; bool dtb_randomness; +bool user_smccc; OnOffAuto acpi; VirtGICType gic_version; VirtIOMMUType iommu; diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 84da49332c..579c6edd49 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -9,6 +9,8 @@ */ #include "qemu/osdep.h" +#include +#include #include #include @@ -247,6 +249,20 @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa) return ret > 0 ? ret : 40; } +static int kvm_arm_init_smccc_filter(KVMState *s) +{ +int ret = 0; + +if (kvm_vm_check_attr(s, KVM_ARM_VM_SMCCC_CTRL, KVM_ARM_VM_SMCCC_FILTER)) { +error_report("ARM SMCCC filter not supported"); +ret = -EINVAL; +goto out; +} + +out: +return ret; +} + int kvm_arch_init(MachineState *ms, KVMState *s) { int ret = 0; @@ -282,6 +298,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s) kvm_arm_init_debug(s); +if (ret == 0 && object_property_get_bool(OBJECT(ms), "user-smccc", NULL)) { +ret = kvm_arm_init_smccc_filter(s); +} + return ret; } @@ -912,6 +932,37 @@ static int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss, return -1; } +static void kvm_arm_smccc_return_result(CPUState *cs, struct arm_smccc_res *res) +{ +ARMCPU *cpu = ARM_CPU(cs); +CPUARMState *env = &cpu->env; + +env->xregs[0] = res->a0; +env->xregs[1] = res->a1; +env->xregs[2] = res->a2; +env->xregs[3] = res->a3; +} + +static int kvm_arm_handle_hypercall(CPUState *cs, struct kvm_run *run) +{ +uint32_t fn = run->hypercall.nr; +struct arm_smccc_res res = { +.a0 = SMCCC_RET_NOT_SUPPORTED, +}; +int ret = 0; + +kvm_cpu_synchronize_state(cs); + +switch (ARM_SMCCC_OWNER_NUM(fn)) { +default: +break; +} + +kvm_arm_smccc_return_result(cs, &res); + +return ret; +} + int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) { int r
[PATCH v1 0/5] target/arm: Handle psci calls in userspace
The userspace SMCCC call filtering[1] provides the ability to forward the SMCCC calls to the userspace. The vCPU hotplug[2] would be the first legitimate use case to handle the psci calls in userspace, thus the vCPU hotplug can deny the PSCI_ON call if the vCPU is not present now. This series try to enable the userspace SMCCC call filtering, thus can handle the SMCCC call in userspace. The first enabled SMCCC call is psci call, by using the new added option 'user-smccc', we can enable handle psci calls in userspace. qemu-system-aarch64 -machine virt,user-smccc=on This series reuse the qemu implementation of the psci handling, thus the handling process is very simple. But when handling psci in userspace when using kvm, the reset vcpu process need to be taking care, the detail is included in the patch05. [1] lore.kernel.org/20230404154050.2270077-1-oliver.up...@linux.dev [2] lore.kernel.org/20230203135043.409192-1-james.mo...@arm.com Shaoqin Huang (5): linux-headers: Update to v6.4-rc7 linux-headers: Import arm-smccc.h from Linux v6.4-rc7 target/arm: make psci call can be used by kvm arm/kvm: add skeleton implementation for userspace SMCCC call handling arm/kvm: add support for userspace psci calls handling docs/system/arm/virt.rst | 4 + hw/arm/virt.c | 21 ++ hw/intc/arm_gicv3_kvm.c | 10 + include/hw/arm/virt.h | 1 + include/standard-headers/linux/const.h| 2 +- include/standard-headers/linux/virtio_blk.h | 18 +- .../standard-headers/linux/virtio_config.h| 6 + include/standard-headers/linux/virtio_net.h | 1 + linux-headers/asm-arm64/kvm.h | 33 +++ linux-headers/asm-riscv/kvm.h | 53 +++- linux-headers/asm-riscv/unistd.h | 9 + linux-headers/asm-s390/unistd_32.h| 1 + linux-headers/asm-s390/unistd_64.h| 1 + linux-headers/asm-x86/kvm.h | 3 + linux-headers/linux/arm-smccc.h | 240 ++ linux-headers/linux/const.h | 2 +- linux-headers/linux/kvm.h | 12 +- linux-headers/linux/psp-sev.h | 7 + linux-headers/linux/userfaultfd.h | 17 +- target/arm/helper.c | 3 +- target/arm/kvm.c | 146 +++ 21 files changed, 573 insertions(+), 17 deletions(-) create mode 100644 linux-headers/linux/arm-smccc.h base-commit: e3660cc1e3cb136af50c0eaaeac27943c2438d1d -- 2.39.1
[PATCH v2] hw: Fix format for comments
Simply fix the #vcpus_count to @vcpus_count in CPUArchId comments. Whlie at it, reorder the parameters in comments to match the sequence of parameters which defined in the CPUArchId. Reviewed-by: Igor Mammedov Signed-off-by: Shaoqin Huang --- include/hw/boards.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index a385010909..e0497c2314 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -101,10 +101,10 @@ MemoryRegion *machine_consume_memdev(MachineState *machine, /** * CPUArchId: * @arch_id - architecture-dependent CPU ID of present or possible CPU + * @vcpus_count - number of threads provided by @cpu object + * @props - CPU object properties, initialized by board * @cpu - pointer to corresponding CPU object if it's present on NULL otherwise * @type - QOM class name of possible @cpu object - * @props - CPU object properties, initialized by board - * #vcpus_count - number of threads provided by @cpu object */ typedef struct CPUArchId { uint64_t arch_id; -- 2.39.1
Re: [PATCH] machine: do not crash if default RAM backend name has been stollen
With the patch, qemu exits normally instead of Aborted. On 5/22/23 21:17, Igor Mammedov wrote: QEMU aborts when default RAM backend should be used (i.e. no explicit '-machine memory-backend=' specified) but user has created an object which 'id' equals to default RAM backend name used by board. $QEMU -machine pc \ -object memory-backend-ram,id=pc.ram,size=4294967296 Actual results: QEMU 7.2.0 monitor - type 'help' for more information (qemu) Unexpected error in object_property_try_add() at ../qom/object.c:1239: qemu-kvm: attempt to add duplicate property 'pc.ram' to object (type 'container') Aborted (core dumped) Instead of abort, check for the conflicting 'id' and exit with an error, suggesting how to remedy the issue. Signed-off-by: Igor Mammedov CC: th...@redhat.com Reviewed-by: Shaoqin Huang --- hw/core/machine.c | 8 1 file changed, 8 insertions(+) diff --git a/hw/core/machine.c b/hw/core/machine.c index 07f763eb2e..1000406211 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -1338,6 +1338,14 @@ void machine_run_board_init(MachineState *machine, const char *mem_path, Error * } } else if (machine_class->default_ram_id && machine->ram_size && numa_uses_legacy_mem()) { +if (object_property_find(object_get_objects_root(), + machine_class->default_ram_id)) { +error_setg(errp, "object name '%s' is reserved for the default" +" RAM backend, it can't be used for any other purposes." +" Change the object's 'id' to something else", +machine_class->default_ram_id); +return; +} if (!create_default_memdev(current_machine, mem_path, errp)) { return; } -- Shaoqin
[PATCH] hw: Fix format for comments
Simply fix the #vcpus_count to @vcpus_count in CPUArchId comments. Since we are at here, resort the parameters in comments to match the sequence of parameters which defined in the CPUArchId. CC: Igor Mammedov Signed-off-by: Shaoqin Huang --- include/hw/boards.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index f4117fdb9a..cefa3d5897 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -101,10 +101,10 @@ MemoryRegion *machine_consume_memdev(MachineState *machine, /** * CPUArchId: * @arch_id - architecture-dependent CPU ID of present or possible CPU + * @vcpus_count - number of threads provided by @cpu object + * @props - CPU object properties, initialized by board * @cpu - pointer to corresponding CPU object if it's present on NULL otherwise * @type - QOM class name of possible @cpu object - * @props - CPU object properties, initialized by board - * #vcpus_count - number of threads provided by @cpu object */ typedef struct CPUArchId { uint64_t arch_id; -- 2.39.1
[PATCH] hw: Fix format for comments
Simply fix the #vcpus_count to @vcpus_count in CPUArchId comments. Since we are at here, resort the parameters in comments to match the sequence of parameters which defined in the CPUArchId. Signed-off-by: Shaoqin Huang --- include/hw/boards.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index f4117fdb9a..cefa3d5897 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -101,10 +101,10 @@ MemoryRegion *machine_consume_memdev(MachineState *machine, /** * CPUArchId: * @arch_id - architecture-dependent CPU ID of present or possible CPU + * @vcpus_count - number of threads provided by @cpu object + * @props - CPU object properties, initialized by board * @cpu - pointer to corresponding CPU object if it's present on NULL otherwise * @type - QOM class name of possible @cpu object - * @props - CPU object properties, initialized by board - * #vcpus_count - number of threads provided by @cpu object */ typedef struct CPUArchId { uint64_t arch_id; -- 2.39.1