Copying in Markus as QAPI maintainer, since I feel this proposed design is a little oddball from typical QAPI design approach....
On Fri, Nov 19, 2021 at 01:51:51PM -0600, Mark Kanda wrote: > Introduce qmp support for querying stats. Provide a framework for > adding new stats and support for the following commands: > > - query-stats > Returns a list of all stats, with options for specifying a stat > name and schema type. A schema type is the set of stats associated > with a given component (e.g. vm or vcpu). > > - query-stats-schemas > Returns a list of stats included in each schema type, with an > option for specifying the schema name. > > - query-stats-instances > Returns a list of stat instances and their associated schema type. > > The framework provides a method to register callbacks for these qmp > commands. > > The first usecase will be for fd-based KVM stats (in an upcoming > patch). > > Examples (with fd-based KVM stats): > > { "execute": "query-stats" } > { "return": [ > { "name": "vcpu_1", > "type": "kvm-vcpu", > "stats": [ > { "name": "guest_mode", > "unit": "none", > "base": 10, > "val": [ 0 ], > "exponent": 0, > "type": "instant" }, > { "name": "directed_yield_successful", > "unit": "none", > "base": 10, > "val": [ 0 ], > "exponent": 0, > "type": "cumulative" }, > ... > }, > { "name": "vcpu_0", > "type": "kvm-vcpu", > "stats": ... > ... > }, > { "name": "vm", > "type": "kvm-vm", > "stats": [ > { "name": "max_mmu_page_hash_collisions", > "unit": "none", > "base": 10, > "val": [ 0 ], > "exponent": 0, > "type": "peak" }, > ... So this is essentially exposing the low level kernel data structure 'struct kvm_stats_desc' mapped 1-to-1 into QAPI. There are pros/cons to doing that should be explored to see whether this actually makes sense for the QMP design. I understand this design is intended to be fully self-describing such that we can add arbitrarily more fields without ever changing QEMU code, and with a simple mapping from the kernel kvm_stats_desc. Taking the first level of data returned, we see the natural structure of the data wrt vCPUs is flattened: { "return": [ { "name": "vcpu_0", "type": "kvm-vcpu", "stats": [...], // stats for vcpu 0 }, { "name": "vcpu_1", "type": "kvm-vcpu", "stats": [...], // stats for vcpu 0 }, ...other vCPUs... { "name": "vm", "type": "kvm-vm", "stats": [...], // stats for the VM }, } This name+type stuff is all unnecessarily indirect. If we ever have to add more data unrelated to the kvm stats, we're going to need QEMU changes no matter what, so this indirect structure isn't future proofing it. I'd rather expect us to have named struct fields for each different provider of data, respecting the natural hierachy. ie use an array for vCPU data. I understand this is future proofed to be able to support non-KVM stats. If we have KVM per-vCPU stat and non-KVM per-VCPU stats at the same time, I'd expect them all to appear in the same place. IOW, overall I'd expect to see grouping more like { "return": { "vcpus": [ [ // stats for vcpu 0 { "provider": 'kvm', "stats": [...] }, { "provider": 'qemu', "stats"; [...] } ], [ // stats for vcpu 1 { "provider": 'kvm', "stats": [...] }, { "provider": 'qemu', "stats"; [...] } ], ...other vCPUs... ] "vm": [ { "provider": 'kvm', "stats": [...] }, { "provider": 'qemu', "stats"; [...] } ], ], } Now onto the values being reported. AFAICT from the kernel docs, for all the types of data it currently reports (cumulative, instant, peak, none), there is only ever going to be a single value. I assume the ability to report multiple values is future proofing for a later requirement. This is fine from a kerenl POV since they're trying to fit this into a flat struct. QAPI is way more flexible. It can switch between reporting an scale or array or scalars for the same field. So if we know the stat will only ever have 1 value, we should be reporting a scalar, not an array which will only ever have one value. Second, for a given named statistic, AFAICT, the data type, unit, base and exponent are all fixed. I don't see a reason for us to be reporting that information every time we call 'query-stats'. Just report the name + value(s). Apps that want a specific named stat won't care about the dimensions, because they'll already know what the value means. Apps that want to be metadata driven to handle arbitrary stats, can just call 'query-stats-schemas' to learn about the dimensions one time. This will give waaay lower data transfer for querying values repeatedly. > > { "execute": "query-stats-schemas" } > { "return": [ > { "type": "kvm-vcpu", > "stats": [ > { "name": "guest_mode" }, > { "name": "directed_yield_successful" }, > ... > }, > { "type": "kvm-vm", > "stats": [ > { "name": "max_mmu_page_hash_collisions" }, > { "name": "max_mmu_rmap_size" }, > ... ...this can be used to introspect the data type, unit, base, exponent as a one time task, if needed. Again, I'd expect the first level of nested to be removed to mirror 'query-stats', { "return": { "vcpu": [ { "provider": "kvm", "stats": [ { "name": "guest_mode", "unit": "none", "base": 10, "exponent": 0 }, { "name": "directed_yield_successful" "unit": "none", "base": 10, "exponent": 0 }, }, ], }, { "provider": "qemu" "stats": [ { "name": "something_blah_blah", "unit": "bytes", "base": 2, "exponent": 20, }, ] }, ], "vm": [ { "provider": "kvm", "stats": [ { "name": "max_mmu_page_hash_collisions", ... } { "name": "max_mmu_rmap_size", ... } ] }, { "provider": "qemu", "stats": [ { "name": "blah", ... } ] }, } } > > { "execute": "query-stats-instances" } > { "return": [ > { "name": "vcpu_1", > "type": "kvm-vcpu" }, > { "name": "vcpu_0", > "type": "kvm-vcpu" }, > { "name": "vm", > "type": "kvm-vm" } ] > } I don't see a need for this command at all. It doesn't tell apps anything they can't already learn from "query-stats-schemas" New 'type' values will involve QEMU code changes no matter what. So in the even we need something other than 'vcpu' and 'vm' stats reported by 'query-stats', apps can just query the QAPI schema in the normal manner to learn about struct field names that exist in this QEMU version. IOW, this really just re-invented QAPI introspection for no benefit IMHO. > diff --git a/qapi/misc.json b/qapi/misc.json > index 358548abe1..a0a07ef0b1 100644 > --- a/qapi/misc.json > +++ b/qapi/misc.json > @@ -527,3 +527,145 @@ > 'data': { '*option': 'str' }, > 'returns': ['CommandLineOptionInfo'], > 'allow-preconfig': true } > + > +## > +# @StatType: > +# > +# Enumeration of stat types > +# @cumulative: stat is cumulative; value can only increase. > +# @instant: stat is instantaneous; value can increase or decrease. > +# @peak: stat is the peak value; value can only increase. > +# > +# Since: 7.0 > +## > +{ 'enum' : 'StatType', > + 'data' : [ 'cumulative', 'instant', 'peak' ] } > + > +## > +# @StatUnit: > +# > +# Enumeration of stat units > +# @bytes: stat reported in bytes. > +# @seconds: stat reported in seconds. > +# @cycles: stat reported in clock cycles. > +# @none: no unit for this stat. > +# > +# Since: 7.0 > +## > +{ 'enum' : 'StatUnit', > + 'data' : [ 'bytes', 'seconds', 'cycles', 'none' ] } > + > +## > +# @StatData: > +# > +# Individual stat > +# @name: Stat name > +# @type: @StatType > +# @unit: @StatUnit > +# @base: Exponent base (2 or 10) > +# @exponent: Used together with @base > +# @val: List of uint64 values > +# > +# Since: 7.0 > +## > +{ 'struct': 'StatData', > + 'data': { 'name': 'str', > + 'type': 'StatType', > + 'unit': 'StatUnit', > + 'base': 'uint8', > + 'exponent': 'int16', > + 'val': [ 'uint64' ] } } > + > +## > +# @Stats: > +# > +# Stats per resource (e.g. vm or vcpu) > +# @name: Resource name > +# @stats: List of @StatData > +# > +# Since: 7.0 > +## > +{ 'struct': 'Stats', > + 'data': {'name': 'str', > + 'type': 'StatSchemaType', > + 'stats': [ 'StatData' ] } } > + > +## > +# @query-stats: > +# > +# @name: Stat name (optional) > +# @type: Type name (optional) > +# Returns: List of @Stats > +# > +# Since: 7.0 > +## > +{ 'command': 'query-stats', > + 'data': { '*name': 'str', '*type': 'str' }, > + 'returns': [ 'Stats' ] } The 'name' and 'type' are used for filtering I presume. Only allowing a single value for each feels pretty inflexible. I'd say we want to allow mutliple requests at a time for efficiency. Bearing in mind my other suggestions above, I'd think we should have something more like { 'enum': 'StatsProvider', 'data': ["kvm", "qemu", "tcg", ....], } { 'struct': 'StatsRequest', 'data': { 'provider': 'StatsProvider', // If omitted, report everything for this provider '*fields': [ 'str' ] } } { 'struct': 'StatsVCPURequest', 'base': 'StatsRequest', 'data': { // To request subset of vCPUs e.g. // "cpu_set": "0-3" // Could use ['int'] instead if desired '*cpu_set': str, }, } { 'struct': 'StatsFilter', 'data': { // If omitted means don't report that group of data '*vcpu': 'StatsVCPURequest', '*vm': 'StatsRequest', }, } { 'alternate': 'StatsValue', 'data': { 'scalar': 'int64', 'list': [ 'int64 ' ] } } { 'struct': 'StatsResultsEntry', 'data': { 'provider': 'StatsProvider', 'stats': [ 'StatsValue' ] } } { 'struct': 'StatsResults': 'data': { '*vcpu': [ [ 'StatsResultsEntry' ] ], '*vm': [ 'StatsResultsEntry' ] } } { 'command': 'query-stats', 'data': { 'filter': '*StatsFilter' }, 'returns': 'StatsResults' } > + > +## > +# @StatSchemaType: > +# > +# Enumeration of stats schema types > +# > +# Since: 7.0 > +## > +{ 'enum' : 'StatSchemaType', > + 'data' : [ ] } > + > +## > +# @StatSchemaEntry: > +# > +# Individual stat in a schema type > +# > +# Since: 7.0 > +## > +{ 'struct': 'StatSchemaEntry', > + 'data': { 'name': 'str' } } > + > +## > +# @StatsSchema: > +# > +# Stats per @StatSchemaType > +# @type: @StatSchemaType > +# @stats: @StatCchemaName > +# > +# Since: 7.0 > +## > +{ 'struct': 'StatsSchema', > + 'data': { 'type': 'StatSchemaType', > + 'stats': [ 'StatSchemaEntry' ] } } > + > +## > +# @query-stats-schemas: > +# > +# @type: type name (optional) > +# Returns: List of @StatsSchema > +# > +# Since: 7.0 > +## > +{ 'command': 'query-stats-schemas', > + 'data': { '*type': 'str' }, > + 'returns': [ 'StatsSchema' ] } I'd think this is more like { 'struct': 'StatsSchemaValue', 'data': { 'name': 'str', 'type': 'StatType', 'unit': 'StatUnit', 'base': 'uint8', 'exponent': 'int16', }, } { 'struct': 'StatsSchemaProvider', 'data': { 'provider': 'StatsProvider', 'stats': [ 'StatsSchemaValue'], } } { 'struct': 'StatsSchemaResult', 'data': { 'vcpu': ['StatsSchemaProvider'], 'vm': ['StatsSchemaProvider'], } } { 'command': 'query-stats-schemas', 'returns': [ 'StatsSchemaResult' ] } > + > +## > +# @StatsInstance: > +# > +# @name: resource name > +# @type: @StatSchemaType > +# > +# Since: 7.0 > +## > +{ 'struct': 'StatsInstance', > + 'data': { 'name': 'str', > + 'type': 'StatSchemaType' } } > + > +## > +# @query-stats-instances: > +# > +# Returns list of @StatsInstance > +# > +# Since: 7.0 > +## > +{ 'command': 'query-stats-instances', > + 'returns': [ 'StatsInstance' ] } As mentioned earlier, IMHO this doesn't need to exist. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|