Re: [PATCH v7 7/7] powerpc/perf/hv-24x7: Document sysfs event description entries
On Fri, Jan 30, 2015 at 4:46 PM, Sukadev Bhattiprolu suka...@linux.vnet.ibm.com wrote: From: Cody P Schafer c...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com Signed-off-by: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com --- Changelog[v6] Update Contact info to Linux on Power Developer list .../testing/sysfs-bus-event_source-devices-hv_24x7 | 22 ++ 1 file changed, 22 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 index 32f3f5f..f893337 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 @@ -21,3 +21,25 @@ Contact: Linux on PowerPC Developer List linuxppc-dev@lists.ozlabs.org Description: Exposes the version field of the 24x7 catalog. This is also extractable from the provided binary catalog sysfs entry. + +What: /sys/bus/event_source/devices/hv_24x7/event_descs/event-name +Date: February 2014 +Contact: Linux on PowerPC Developer List linuxppc-dev@lists.ozlabs.org +Description: + Provides the description of a particular event as provided by + the firmware. If firmware does not provide a description, no + file will be created. + + Note that the event-name lacks the domain suffix appended for + events in the events/ dir. I'm probably a bit late on this, but: Please consider removing the need for a user to know about the domain suffixes (which, as far as I know are 24x7 specific). If anyone else ever wants to add firmware/hardware/kernel provided event descriptions, they'll need to special case these ones as they don't match up with the actual event names. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 1/4] tools/perf: support parsing parameterized events
On Thu, Dec 4, 2014 at 7:44 AM, Jiri Olsa jo...@redhat.com wrote: On Tue, Dec 02, 2014 at 06:09:35PM -0800, Sukadev Bhattiprolu wrote: From: Cody P Schafer c...@linux.vnet.ibm.com Enable event specification like: pmu/event_name,param1=0x1,param2=0x4/ Assuming that /sys/bus/event_source/devices/pmu/events/event_name Contains something like param2=$foo,bar=1,param1=$baz oops.. sorry to be PITA on this one.. I might have missed something in the previous discussion but I guess I might have finally some opinion on this ;-) here's how I think your patchset works: in /sys/bus/event_source/devices/pmu/events/event_name you can actually have: param2=foo,bar=1,param1=baz notice no '$', thats what you add later in 'perf list' output, right? Moreover it actually does not matter whats in value 'param2=HERE', because it's not used in the config code at all apart from the 'perf list' display processing. So when we discussed the '$' name way, I thought it'd be like: in /sys/bus/event_source/devices/pmu/events/event_name you have: param2=$foo,bar=1,param1=$baz and on command line you'd use: pmu/event_name,foo=0x1,bar=0x4/ to assign directly to the $var, which would justify the $var syntax I think.. Agreed, what you've described above sounds like a good idea. Compared to monopolizing all strings (which is what I did when initialy writing this), using a '$' prefix would allow less pain when some events suddenly need non-integer parameters. anyway we could assign directly to the param term name as you do, but I think we just need to mark the term as parametrized, like: in /sys/bus/event_source/devices/pmu/events/event_name you have: param2=?,bar=1,param1=? and on command line you'd use: pmu/event_name,param2=0x1,param1=0x4/ while the config code would check that the param substitution is done only for terms with '?' in value, like 'param2=?' and not for all PARSE_EVENTS__TERM_TYPE_STR type terms (as of now) I prefer the `foo=0x1` as mentioned previously: it makes the user interface much less painful as we can have event-specific names for register/hcall fields. I'm pretty sure the code used to do this, not sure when it was removed (haven't been following this patchset closely). That said: I haven't fiddled with this code in a while (it's Suka's at this point), and there might be arguments the other way on both of those. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 6/6] powerpc/perf/hv-24x7: Document sysfs event description entries
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 index 32f3f5f..cf70084 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 @@ -21,3 +21,25 @@ Contact: Linux on PowerPC Developer List linuxppc-dev@lists.ozlabs.org +Contact: Cody P Schafer c...@linux.vnet.ibm.com Probably want someone else to be the contact here. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 10/10] powerpc/perf/hv-24x7: Document sysfs event description entries
+What: /sys/bus/event_source/devices/hv_24x7/event_descs/event-name +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com May want to change this contact email to an address that still works (perhaps the ppc devel list?) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be, le}_to_cpu() and cpu_to_{be, le}() macros
On Wed, May 28, 2014 at 3:45 AM, David Laight david.lai...@aculab.com wrote: From: Cody P Schafer Rather manually specifying the size of the integer to be converted, key off of the type size. Reduces duplicate size info and the occurance of certain types of bugs (using the wrong sized conversion). ... +#define be_to_cpu(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \ + (void)0 ... I'm not at all sure that using the 'size' of the constant will reduce the number of bugs - it just introduces a whole new category of bugs. Certainly, if you mis-size the argument (and thus have missized one of the variables containing the be value, probably a bug anyhow), there will be problems. I put this interface together because of an actual bug I wrote into the initial code of the hv_24x7 driver (resized a struct member without adjusting the be*_to_cpu() sizing). Having this auto sizing macro means I can avoid encoding the size of a struct field in multiple places. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be,le}_to_cpu() and cpu_to_{be,le}() macros
On Tue, May 27, 2014 at 7:44 PM, Joe Perches j...@perches.com wrote: On Tue, 2014-05-27 at 17:22 -0700, Cody P Schafer wrote: Rather manually specifying the size of the integer to be converted, key off of the type size. Reduces duplicate size info and the occurance of certain types of bugs (using the wrong sized conversion). [] diff --git a/include/linux/byteorder.h b/include/linux/byteorder.h [] @@ -0,0 +1,34 @@ +#ifndef LINUX_BYTEORDER_H_ +#define LINUX_BYTEORDER_H_ + +#include asm/byteorder.h + +#define be_to_cpu(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \ + (void)0 probably better to use BUILD_BUG instead of these 0 returns They aren't 0 returns. $ echo int main(void) { int x = (void)0; return x; } | gcc -x c - stdin: In function ‘main’: stdin:1:26: error: void value not ignored as it ought to be ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be, le}_to_cpu() and cpu_to_{be, le}() macros
On Wed, May 28, 2014 at 5:05 PM, Cody P Schafer d...@codyps.com wrote: On Wed, May 28, 2014 at 3:45 AM, David Laight david.lai...@aculab.com wrote: From: Cody P Schafer Rather manually specifying the size of the integer to be converted, key off of the type size. Reduces duplicate size info and the occurance of certain types of bugs (using the wrong sized conversion). ... +#define be_to_cpu(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \ + (void)0 ... I'm not at all sure that using the 'size' of the constant will reduce the number of bugs - it just introduces a whole new category of bugs. Certainly, if you mis-size the argument (and thus have missized one of the variables containing the be value, probably a bug anyhow), there will be problems. I put this interface together because of an actual bug I wrote into the initial code of the hv_24x7 driver (resized a struct member without adjusting the be*_to_cpu() sizing). Having this auto sizing macro means I can avoid encoding the size of a struct field in multiple places. To clarify, the point I'm making here is that this simply cuts out 1 more place we can screw up endianness conversion sizing. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be, le}_to_cpu() and cpu_to_{be, le}() macros
On Wed, May 28, 2014 at 6:00 PM, Joe Perches j...@perches.com wrote: On Wed, 2014-05-28 at 17:11 -0500, Cody P Schafer wrote: On Wed, May 28, 2014 at 5:05 PM, Cody P Schafer d...@codyps.com wrote: On Wed, May 28, 2014 at 3:45 AM, David Laight david.lai...@aculab.com wrote: From: Cody P Schafer Rather manually specifying the size of the integer to be converted, key off of the type size. Reduces duplicate size info and the occurance of certain types of bugs (using the wrong sized conversion). ... +#define be_to_cpu(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \ + (void)0 ... I'm not at all sure that using the 'size' of the constant will reduce the number of bugs - it just introduces a whole new category of bugs. Certainly, if you mis-size the argument (and thus have missized one of the variables containing the be value, probably a bug anyhow), there will be problems. I put this interface together because of an actual bug I wrote into the initial code of the hv_24x7 driver (resized a struct member without adjusting the be*_to_cpu() sizing). Having this auto sizing macro means I can avoid encoding the size of a struct field in multiple places. To clarify, the point I'm making here is that this simply cuts out 1 more place we can screw up endianness conversion sizing. It does screw up other types when you do things like: u8 foo = some_function(); cpu_to_be(foo + 1); the return value is sizeof(int) not u8 Yep, that is a very good argument against the cpu_to_{be,le}() variants. It might make sense to remove them and just have the {be,le}_to_cpu() ones. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 00/16] perf: add support for parameterized events from sysfs (powerpc 24x7)
What this patchset does: - the first patch (override sysfs in tools/perf via SYSFS_PATH) was sent out previously, but needed a resend anyhow. Having it is useful for testing the later changes to tools/perf. - the second patch is a bugfix to the powerpc hv-24x7 code which was previously sent out, which is a good idea to have when testing these patches on POWER8 hardware. - document perf sysfs and the changes to add parameterized events - semi-notably: removes the growing list of specific POWER cpu events and begins documenting them generically, much like the docs for /sys/modules/MODULENAME do for modules. - tools/perf changes to support parameterized events - export some parameterized events from the powerpc pmus hv_24x7 and hv_gpci Description of event parameters from the documentation patch: Event parameters are a basic way for partial events to be specified in sysfs with per-event names given to the fields that need to be filled in when using a particular event. It is intended for supporting cases where the single 'cpu' parameter is insufficient. For example, POWER 8 has events for physical sockets/cores/cpus that are accessible from with virtual machines. To keep using the single 'cpu' parameter we'd need to perform a mapping between Linux's cpus and the physical machine's cpus (in this case Linux is running under a hypervisor). This isn't possible because bindings between our cpus and physical cpus may not be fixed, and we probably won't have a cpu on each physical cpu. Description of the sysfs contents when events are parameterized (copied from an included patch): Examples: domain=0x1,offset=0x8,starting_index=phys_cpu In the case of the last example, a value replacing phys_cpu would need to be provided by the user selecting the particular event. This is refered to as event parameterization. All non-numerical values indicate an event parameter. Notes on how perf-list displays parameterized events (and how to use them, again culled from an included patch): PARAMETERIZED EVENTS Some pmu events listed by 'perf-list' will be displayed with '?' in them. For example: hv_gpci/dtbp_ptitc,phys_processor_idx=?/ This means that when provided as an event, a value for phys_processor_idx must also be supplied. For example: perf stat -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ... Cody P Schafer (16): tools/perf: allow overriding sysfs and proc finding with env var powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations perf Documentation: sysfs events/ interfaces perf Documentation: remove duplicated docs for powerpc cpu specific events perf Documentation: add event parameters tools/perf: annotate list_head with type info tools/perf: support parsing parameterized events tools/perf: extend format_alias() to include event parameters tools/perf: document parameterized events and note symbolically formed events perf: provide sysfs_show for struct perf_pmu_events_attr byteorder: provide a linux/byteorder.h with {be,le}_to_cpu() and cpu_to_{be,le}() macros powerpc/perf/hv-24x7: parse catalog and populate sysfs with events powerpc/perf/hv-24x7: Documentaion for new sysfs entries which expose descriptions perf: add PMU_EVENT_ATTR_STRING() helper powerpc/perf/{hv-gpci,hv-common}: generate requests with counters annotated powerpc/perf/hv-gpci: add the remaining gpci requests .../testing/sysfs-bus-event_source-devices-events | 617 ++-- .../testing/sysfs-bus-event_source-devices-hv_24x7 | 22 + arch/powerpc/perf/hv-24x7-catalog.h| 25 + arch/powerpc/perf/hv-24x7-domains.h| 19 + arch/powerpc/perf/hv-24x7.c| 812 - arch/powerpc/perf/hv-24x7.h| 12 +- arch/powerpc/perf/hv-common.c | 10 +- arch/powerpc/perf/hv-gpci-requests.h | 258 +++ arch/powerpc/perf/hv-gpci.c| 8 + arch/powerpc/perf/hv-gpci.h| 37 +- arch/powerpc/perf/req-gen/_begin.h | 13 + arch/powerpc/perf/req-gen/_clear.h | 5 + arch/powerpc/perf/req-gen/_end.h | 4 + arch/powerpc/perf/req-gen/_request-begin.h | 15 + arch/powerpc/perf/req-gen/_request-end.h | 8 + arch/powerpc/perf/req-gen/perf.h | 155 include/linux/byteorder.h | 34 + include/linux/perf_event.h | 10 + kernel/events/core.c | 8 + tools/lib/api/fs/fs.c | 43 +- tools/perf/Documentation/perf-list.txt | 13 + tools/perf/Documentation/perf-record.txt
[PATCH 01/16] tools/perf: allow overriding sysfs and proc finding with env var
SYSFS_PATH and PROC_PATH environment variables now let the user override the detection of sysfs and proc locations for testing purposes. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- tools/lib/api/fs/fs.c | 43 ++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c index 5b5eb78..c1b49c3 100644 --- a/tools/lib/api/fs/fs.c +++ b/tools/lib/api/fs/fs.c @@ -1,8 +1,10 @@ /* TODO merge/factor in debugfs.c here */ +#include ctype.h #include errno.h #include stdbool.h #include stdio.h +#include stdlib.h #include string.h #include sys/vfs.h @@ -96,12 +98,51 @@ static bool fs__check_mounts(struct fs *fs) return false; } +static void mem_toupper(char *f, size_t len) +{ + while (len) { + *f = toupper(*f); + f++; + len--; + } +} + +/* + * Check for NAME_PATH environment variable to override fs location (for + * testing). This matches the recommendation in Documentation/sysfs-rules.txt + * for SYSFS_PATH. + */ +static bool fs__env_override(struct fs *fs) +{ + char *override_path; + size_t name_len = strlen(fs-name); + /* name + _PATH + '\0' */ + char upper_name[name_len + 5 + 1]; + memcpy(upper_name, fs-name, name_len); + mem_toupper(upper_name, name_len); + strcpy(upper_name[name_len], _PATH); + + override_path = getenv(upper_name); + if (!override_path) + return false; + + fs-found = true; + strncpy(fs-path, override_path, sizeof(fs-path)); + return true; +} + static const char *fs__get_mountpoint(struct fs *fs) { + if (fs__env_override(fs)) + return fs-path; + if (fs__check_mounts(fs)) return fs-path; - return fs__read_mounts(fs) ? fs-path : NULL; + if (fs__read_mounts(fs)) + return fs-path; + + return NULL; } static const char *fs__mountpoint(int idx) -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 02/16] powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations
Ian pointed out the use of __aligned(4096) caused rather large stack consumption in single_24x7_request(), so use the kmem_cache hv_page_cache (which we've already got set up for other allocations) insead of allocating locally. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Reported-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- arch/powerpc/perf/hv-24x7.c | 52 - 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index e0766b8..9a7a830 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -294,7 +294,7 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, u16 lpar, u64 *res, bool success_expected) { - unsigned long ret; + unsigned long ret = -ENOMEM; /* * request_buffer and result_buffer are not required to be 4k aligned, @@ -304,7 +304,27 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, struct reqb { struct hv_24x7_request_buffer buf; struct hv_24x7_request req; - } __packed __aligned(4096) request_buffer = { + } __packed *request_buffer; + struct resb { + struct hv_24x7_data_result_buffer buf; + struct hv_24x7_result res; + struct hv_24x7_result_element elem; + __be64 result; + } __packed *result_buffer; + + BUILD_BUG_ON(sizeof(*request_buffer) 4096); + BUILD_BUG_ON(sizeof(*result_buffer) 4096); + + request_buffer = kmem_cache_alloc(hv_page_cache, GFP_USER); + + if (!request_buffer) + goto out_reqb; + + result_buffer = kmem_cache_zalloc(hv_page_cache, GFP_USER); + if (!result_buffer) + goto out_resb; + + *request_buffer = (struct reqb) { .buf = { .interface_version = HV_24X7_IF_VERSION_CURRENT, .num_requests = 1, @@ -320,28 +340,30 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, } }; - struct resb { - struct hv_24x7_data_result_buffer buf; - struct hv_24x7_result res; - struct hv_24x7_result_element elem; - __be64 result; - } __packed __aligned(4096) result_buffer = {}; - ret = plpar_hcall_norets(H_GET_24X7_DATA, - virt_to_phys(request_buffer), sizeof(request_buffer), - virt_to_phys(result_buffer), sizeof(result_buffer)); + virt_to_phys(request_buffer), sizeof(*request_buffer), + virt_to_phys(result_buffer), sizeof(*result_buffer)); if (ret) { if (success_expected) pr_err_ratelimited(hcall failed: %d %#x %#x %d = 0x%lx (%ld) detail=0x%x failing ix=%x\n, domain, offset, ix, lpar, ret, ret, - result_buffer.buf.detailed_rc, - result_buffer.buf.failing_request_ix); - return ret; + result_buffer-buf.detailed_rc, + result_buffer-buf.failing_request_ix); + goto out_hcall; } - *res = be64_to_cpu(result_buffer.result); + *res = be64_to_cpu(result_buffer-result); + kfree(result_buffer); + kfree(request_buffer); + return ret; + +out_hcall: + kfree(result_buffer); +out_resb: + kfree(request_buffer); +out_reqb: return ret; } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 03/16] perf Documentation: sysfs events/ interfaces
Add documentation for the event, event.scale, and event.unit files in sysfs. event.scale and event.unit were undocumented. event was previously documented only for specific powerpc pmu events. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- .../testing/sysfs-bus-event_source-devices-events | 60 ++ 1 file changed, 60 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events index 7b40a3c..a5226f0 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events @@ -599,3 +599,63 @@ Description: POWER-systems specific performance monitoring events Further, multiple terms like 'event=0x' can be specified and separated with comma. All available terms are defined in the /sys/bus/event_source/devices/dev/format file. + +What: /sys/bus/event_source/devices/pmu/events/event +Date: 2014/02/24 +Contact: Linux kernel mailing list linux-ker...@vger.kernel.org +Description: Per-pmu performance monitoring events specific to the running system + + Each file (except for some of those with a '.' in them, '.unit' + and '.scale') in the 'events' directory describes a single + performance monitoring event supported by the pmu. The name + of the file is the name of the event. + + File contents: + + term[=value][,term[=value]]... + + Where term is one of the terms listed under + /sys/bus/event_source/devices/pmu/format/ and value is + a number is base-16 format with a '0x' prefix (lowercase only). + If a term is specified alone (without an assigned value), it + is implied that 0x1 is assigned to that term. + + Examples (each of these lines would be in a seperate file): + + event=0x2abc + event=0x423,inv,cmask=0x3 + domain=0x1,offset=0x8,starting_index=0x + + Each of the assignments indicates a value to be assigned to a + particular set of bits (as defined by the format file + corresponding to the term) in the perf_event structure passed + to the perf_open syscall. + +What: /sys/bus/event_source/devices/pmu/events/event.unit +Date: 2014/02/24 +Contact: Linux kernel mailing list linux-ker...@vger.kernel.org +Description: Perf event units + + A string specifying the English plural numerical unit that event + (once multiplied by event.scale) represents. + + Example: + + Joules + +What: /sys/bus/event_source/devices/pmu/events/event.scale +Date: 2014/02/24 +Contact: Linux kernel mailing list linux-ker...@vger.kernel.org +Description: Perf event scaling factors + + A string representing a floating point value expressed in + scientific notation to be multiplied by the event count + recieved from the kernel to match the unit specified in the + event.unit file. + + Example: + + 2.3283064365386962890625e-10 + + This is provided to avoid performing floating point arithmetic + in the kernel. -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 04/16] perf Documentation: remove duplicated docs for powerpc cpu specific events
Listing specific events doesn't actually help us at all here because: - these events actually vary between different ppc processors, they aren't garunteed to be present. - the documentation of the (generic) file contents is now superceded by the docs for arbitrary event file contents. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- .../testing/sysfs-bus-event_source-devices-events | 573 - 1 file changed, 573 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events index a5226f0..20979f8 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events @@ -27,579 +27,6 @@ Description:Generic performance monitoring events basename. -What: /sys/devices/cpu/events/PM_1PLUS_PPC_CMPL - /sys/devices/cpu/events/PM_BRU_FIN - /sys/devices/cpu/events/PM_BR_MPRED - /sys/devices/cpu/events/PM_CMPLU_STALL - /sys/devices/cpu/events/PM_CMPLU_STALL_BRU - /sys/devices/cpu/events/PM_CMPLU_STALL_DCACHE_MISS - /sys/devices/cpu/events/PM_CMPLU_STALL_DFU - /sys/devices/cpu/events/PM_CMPLU_STALL_DIV - /sys/devices/cpu/events/PM_CMPLU_STALL_ERAT_MISS - /sys/devices/cpu/events/PM_CMPLU_STALL_FXU - /sys/devices/cpu/events/PM_CMPLU_STALL_IFU - /sys/devices/cpu/events/PM_CMPLU_STALL_LSU - /sys/devices/cpu/events/PM_CMPLU_STALL_REJECT - /sys/devices/cpu/events/PM_CMPLU_STALL_SCALAR - /sys/devices/cpu/events/PM_CMPLU_STALL_SCALAR_LONG - /sys/devices/cpu/events/PM_CMPLU_STALL_STORE - /sys/devices/cpu/events/PM_CMPLU_STALL_THRD - /sys/devices/cpu/events/PM_CMPLU_STALL_VECTOR - /sys/devices/cpu/events/PM_CMPLU_STALL_VECTOR_LONG - /sys/devices/cpu/events/PM_CYC - /sys/devices/cpu/events/PM_GCT_NOSLOT_BR_MPRED - /sys/devices/cpu/events/PM_GCT_NOSLOT_BR_MPRED_IC_MISS - /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC - /sys/devices/cpu/events/PM_GCT_NOSLOT_IC_MISS - /sys/devices/cpu/events/PM_GRP_CMPL - /sys/devices/cpu/events/PM_INST_CMPL - /sys/devices/cpu/events/PM_LD_MISS_L1 - /sys/devices/cpu/events/PM_LD_REF_L1 - /sys/devices/cpu/events/PM_RUN_CYC - /sys/devices/cpu/events/PM_RUN_INST_CMPL - /sys/devices/cpu/events/PM_IC_DEMAND_L2_BR_ALL - /sys/devices/cpu/events/PM_GCT_UTIL_7_TO_10_SLOTS - /sys/devices/cpu/events/PM_PMC2_SAVED - /sys/devices/cpu/events/PM_VSU0_16FLOP - /sys/devices/cpu/events/PM_MRK_LSU_DERAT_MISS - /sys/devices/cpu/events/PM_MRK_ST_CMPL - /sys/devices/cpu/events/PM_NEST_PAIR3_ADD - /sys/devices/cpu/events/PM_L2_ST_DISP - /sys/devices/cpu/events/PM_L2_CASTOUT_MOD - /sys/devices/cpu/events/PM_ISEG - /sys/devices/cpu/events/PM_MRK_INST_TIMEO - /sys/devices/cpu/events/PM_L2_RCST_DISP_FAIL_ADDR - /sys/devices/cpu/events/PM_LSU1_DC_PREF_STREAM_CONFIRM - /sys/devices/cpu/events/PM_IERAT_WR_64K - /sys/devices/cpu/events/PM_MRK_DTLB_MISS_16M - /sys/devices/cpu/events/PM_IERAT_MISS - /sys/devices/cpu/events/PM_MRK_PTEG_FROM_LMEM - /sys/devices/cpu/events/PM_FLOP - /sys/devices/cpu/events/PM_THRD_PRIO_4_5_CYC - /sys/devices/cpu/events/PM_BR_PRED_TA - /sys/devices/cpu/events/PM_EXT_INT - /sys/devices/cpu/events/PM_VSU_FSQRT_FDIV - /sys/devices/cpu/events/PM_MRK_LD_MISS_EXPOSED_CYC - /sys/devices/cpu/events/PM_LSU1_LDF - /sys/devices/cpu/events/PM_IC_WRITE_ALL - /sys/devices/cpu/events/PM_LSU0_SRQ_STFWD - /sys/devices/cpu/events/PM_PTEG_FROM_RL2L3_MOD - /sys/devices/cpu/events/PM_MRK_DATA_FROM_L31_SHR - /sys/devices/cpu/events/PM_DATA_FROM_L21_MOD - /sys/devices/cpu/events/PM_VSU1_SCAL_DOUBLE_ISSUED - /sys/devices/cpu/events/PM_VSU0_8FLOP - /sys/devices/cpu/events/PM_POWER_EVENT1 - /sys/devices/cpu/events/PM_DISP_CLB_HELD_BAL - /sys/devices/cpu/events/PM_VSU1_2FLOP - /sys/devices/cpu/events/PM_LWSYNC_HELD - /sys/devices/cpu/events/PM_PTEG_FROM_DL2L3_SHR - /sys/devices/cpu/events/PM_INST_FROM_L21_MOD - /sys/devices/cpu/events/PM_IERAT_XLATE_WR_16MPLUS - /sys/devices/cpu/events/PM_IC_REQ_ALL
[PATCH 05/16] perf Documentation: add event parameters
Event parameters are a basic way for partial events to be specified in sysfs with per-event names given to the fields that need to be filled in when using a particular event. It is intended for supporting cases where the single 'cpu' parameter is insufficient. For example, POWER 8 has events for physical sockets/cores/cpus that are accessible from with virtual machines. To keep using the single 'cpu' parameter we'd need to perform a mapping between Linux's cpus and the physical machine's cpus (in this case Linux is running under a hypervisor). This isn't possible because bindings between our cpus and physical cpus may not be fixed, and we probably won't have a cpu on each physical cpu. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- Documentation/ABI/testing/sysfs-bus-event_source-devices-events | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events index 20979f8..c1f9850 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events @@ -52,12 +52,18 @@ Description:Per-pmu performance monitoring events specific to the running syste event=0x2abc event=0x423,inv,cmask=0x3 domain=0x1,offset=0x8,starting_index=0x + domain=0x1,offset=0x8,starting_index=phys_cpu Each of the assignments indicates a value to be assigned to a particular set of bits (as defined by the format file corresponding to the term) in the perf_event structure passed to the perf_open syscall. + In the case of the last example, a value replacing phys_cpu + would need to be provided by the user selecting the particular + event. This is refered to as event parameterization. All + non-numerical values indicate an event parameter. + What: /sys/bus/event_source/devices/pmu/events/event.unit Date: 2014/02/24 Contact: Linux kernel mailing list linux-ker...@vger.kernel.org -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 06/16] tools/perf: annotate list_head with type info
CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- tools/perf/util/pmu.c | 4 ++-- tools/perf/util/pmu.h | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 00a7dcb..906ae40 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -14,8 +14,8 @@ struct perf_pmu_alias { char *name; - struct list_head terms; - struct list_head list; + struct list_head terms; /* HEAD struct parse_events_term - list */ + struct list_head list; /* ELEM */ char unit[UNIT_MAX_LEN+1]; double scale; }; diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 8b64125..4a85230 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -17,9 +17,9 @@ struct perf_pmu { char *name; __u32 type; struct cpu_map *cpus; - struct list_head format; - struct list_head aliases; - struct list_head list; + struct list_head format; /* HEAD struct perf_pmu_format - list */ + struct list_head aliases; /* HEAD struct perf_pmu_alias - list */ + struct list_head list;/* ELEM */ }; struct perf_pmu *perf_pmu__find(const char *name); -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 07/16] tools/perf: support parsing parameterized events
Enable event specification like: pmu/event_name,param1=0x1,param2=0x4/ Assuming that /sys/bus/event_source/devices/pmu/events/event_name Contains something like bar=param2,foo=1,baz=param1 CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- tools/perf/util/parse-events.h | 1 + tools/perf/util/pmu.c | 55 ++ 2 files changed, 46 insertions(+), 10 deletions(-) diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index f1cb4c4..1147e87 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -60,6 +60,7 @@ struct parse_events_term { int type_val; int type_term; struct list_head list; + bool used; }; struct parse_events_evlist { diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 906ae40..db53fac 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -504,27 +504,57 @@ static __u64 pmu_format_value(unsigned long *format, __u64 value) } /* + * Term is a string term, and might be a param-term. Try to look up it's value + * in the remaining terms. + * - We have a term like base-or-format-term=param-term, + * - We need to find the value supplied for param-term (with param-term named + * in a config string) later on in the term list. + */ +static int pmu_resolve_param_term(struct parse_events_term *term, + struct list_head *head_terms, + __u64 *value) +{ + struct parse_events_term *t; + + list_for_each_entry(t, head_terms, list) + if (t-type_val == PARSE_EVENTS__TERM_TYPE_NUM) { + if (!strcmp(t-config, term-val.str)) { + t-used = true; + *value = t-val.num; + return 0; + } + } + + return -1; +} + +/* * Setup one of config[12] attr members based on the * user input data - term parameter. */ static int pmu_config_term(struct list_head *formats, struct perf_event_attr *attr, - struct parse_events_term *term) + struct parse_events_term *term, + struct list_head *head_terms) { struct perf_pmu_format *format; __u64 *vp; + __u64 val; + + /* +* If this is a parameter we've already used for parameterized-eval, +* skip it in normal eval. +*/ + if (term-used) + return 0; /* -* Support only for hardcoded and numnerial terms. * Hardcoded terms should be already in, so nothing * to be done for them. */ if (parse_events__is_hardcoded_term(term)) return 0; - if (term-type_val != PARSE_EVENTS__TERM_TYPE_NUM) - return -EINVAL; - format = pmu_find_format(formats, term-config); if (!format) return -EINVAL; @@ -544,11 +574,16 @@ static int pmu_config_term(struct list_head *formats, } /* -* XXX If we ever decide to go with string values for -* non-hardcoded terms, here's the place to translate -* them into value. +* Either directly use a numeric term, or try to translate string terms +* using event parameters. */ - *vp |= pmu_format_value(format-bits, term-val.num); + if (term-type_val == PARSE_EVENTS__TERM_TYPE_NUM) + val = term-val.num; + else + if (pmu_resolve_param_term(term, head_terms, val)) + return -EINVAL; + + *vp |= pmu_format_value(format-bits, val); return 0; } @@ -559,7 +594,7 @@ int perf_pmu__config_terms(struct list_head *formats, struct parse_events_term *term; list_for_each_entry(term, head_terms, list) - if (pmu_config_term(formats, attr, term)) + if (pmu_config_term(formats, attr, term, head_terms)) return -EINVAL; return 0; -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 08/16] tools/perf: extend format_alias() to include event parameters
This causes `perf list pmu` to show parameters for parameterized events like follows: pmu/event_name,param1=?,param2=?/ [Kernel PMU event] An example: hv_gpci/dispatch_timebase_by_processor_processor_time_in_timebase_cycles,phys_processor_idx=?/ [Kernel PMU event] CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- tools/perf/util/pmu.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index db53fac..7b8d067 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -741,10 +741,33 @@ void perf_pmu__set_format(unsigned long *bits, long from, long to) set_bit(b, bits); } +static int sub_non_neg(int a, int b) +{ + if (b a) + return 0; + return a - b; +} + static char *format_alias(char *buf, int len, struct perf_pmu *pmu, struct perf_pmu_alias *alias) { - snprintf(buf, len, %s/%s/, pmu-name, alias-name); + struct parse_events_term *term; + int used = snprintf(buf, len, %s/%s, pmu-name, alias-name); + + list_for_each_entry(term, alias-terms, list) + if (term-type_val == PARSE_EVENTS__TERM_TYPE_STR) + used += snprintf(buf + used, sub_non_neg(len, used), + ,%s=?, term-val.str); + + if (sub_non_neg(len, used) 0) { + buf[used] = '/'; + used++; + } + if (sub_non_neg(len, used) 0) { + buf[used] = '\0'; + used++; + } else + buf[len - 1] = '\0'; return buf; } @@ -795,6 +818,7 @@ void print_pmu_events(const char *event_glob, bool name_only) if (is_cpu !name_only) aliases[j] = format_alias_or(buf, sizeof(buf), pmu, alias); + aliases[j] = strdup(aliases[j]); j++; } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 09/16] tools/perf: document parameterized events and note symbolically formed events
CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- tools/perf/Documentation/perf-list.txt | 13 + tools/perf/Documentation/perf-record.txt | 5 + 2 files changed, 18 insertions(+) diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 6fce6a6..626818b 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -89,6 +89,19 @@ raw encoding of 0x1A8 can be used: You should refer to the processor specific documentation for getting these details. Some of them are referenced in the SEE ALSO section below. +PARAMETERIZED EVENTS + + +Some pmu events listed by 'perf-list' will be displayed with '?' in them. For +example: + + hv_gpci/dtbp_ptitc,phys_processor_idx=?/ + +This means that when provided as an event, a value for phys_processor_idx must +also be supplied. For example: + + perf stat -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ... + OPTIONS --- diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index c71b0f3..c005180 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -33,6 +33,11 @@ OPTIONS - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor. + - a symbolicly formed PMU event like 'pmu/value1=0x3,value2/' where + 'value1' and 'value2' are defined as formats in + /sys/bus/event_sources/devices/pmu/format/* OR are one of 'config', + 'config1', 'config2'. + - a hardware breakpoint event in the form of '\mem:addr[:access]' where addr is the address in memory you want to break in. Access is the memory access type (read, write, execute) it can -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 10/16] perf: provide sysfs_show for struct perf_pmu_events_attr
(struct perf_pmu_events_attr) is defined in include/linux/perf_event.h, but the only show for it is in x86 and contains x86 specific stuff. Make a generic one for those of us who are just using the event_str. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- include/linux/perf_event.h | 3 +++ kernel/events/core.c | 8 2 files changed, 11 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 3356abc..6c1d6dd 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -867,6 +867,9 @@ struct perf_pmu_events_attr { const char *event_str; }; +ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr, + char *page); + #define PMU_EVENT_ATTR(_name, _var, _id, _show) \ static struct perf_pmu_events_attr _var = {\ .attr = __ATTR(_name, 0444, _show, NULL), \ diff --git a/kernel/events/core.c b/kernel/events/core.c index f83a71a..6830e21 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7971,6 +7971,14 @@ void __init perf_event_init(void) != 1024); } +ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr, + char *page) +{ + struct perf_pmu_events_attr *pmu_attr = + container_of(attr, struct perf_pmu_events_attr, attr); + return sprintf(page, %s\n, pmu_attr-event_str); +} + static int __init perf_event_sysfs_init(void) { struct pmu *pmu; -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 11/16] byteorder: provide a linux/byteorder.h with {be, le}_to_cpu() and cpu_to_{be, le}() macros
Rather manually specifying the size of the integer to be converted, key off of the type size. Reduces duplicate size info and the occurance of certain types of bugs (using the wrong sized conversion). CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- include/linux/byteorder.h | 34 ++ 1 file changed, 34 insertions(+) create mode 100644 include/linux/byteorder.h diff --git a/include/linux/byteorder.h b/include/linux/byteorder.h new file mode 100644 index 000..c7ab8da --- /dev/null +++ b/include/linux/byteorder.h @@ -0,0 +1,34 @@ +#ifndef LINUX_BYTEORDER_H_ +#define LINUX_BYTEORDER_H_ + +#include asm/byteorder.h + +#define be_to_cpu(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \ + (void)0 + +#define le_to_cpu(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), le16_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), le32_to_cpu(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), le64_to_cpu(v), \ + (void)0 + +#define cpu_to_le(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), cpu_to_le16(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), cpu_to_le32(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), cpu_to_le64(v), \ + (void)0 + +#define cpu_to_be(v) \ + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \ + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), cpu_to_be16(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), cpu_to_be32(v), \ + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), cpu_to_be64(v), \ + (void)0 + +#endif -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 12/16] powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
Retrieves and parses the 24x7 catalog on POWER systems that supply it (right now, only POWER 8). Events are exposed via sysfs in the standard fashion, and are all parameterized. Catalog is (at the moment) only parsed on boot. It needs re-parsing when a some hypervisor events occur. At that point we'll also need to prevent old events from continuing to function (counter that is passed in via spare space in the config values?). CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- arch/powerpc/perf/hv-24x7-catalog.h | 25 ++ arch/powerpc/perf/hv-24x7-domains.h | 19 + arch/powerpc/perf/hv-24x7.c | 760 +++- arch/powerpc/perf/hv-24x7.h | 12 +- 4 files changed, 804 insertions(+), 12 deletions(-) create mode 100644 arch/powerpc/perf/hv-24x7-domains.h diff --git a/arch/powerpc/perf/hv-24x7-catalog.h b/arch/powerpc/perf/hv-24x7-catalog.h index 21b19dd..69e2e1f 100644 --- a/arch/powerpc/perf/hv-24x7-catalog.h +++ b/arch/powerpc/perf/hv-24x7-catalog.h @@ -30,4 +30,29 @@ struct hv_24x7_catalog_page_0 { __u8 reserved6[2]; } __packed; +struct hv_24x7_event_data { + __be16 length; /* in bytes, must be a multiple of 16 */ + __u8 reserved1[2]; + __u8 domain; /* Chip = 1, Core = 2 */ + __u8 reserved2[1]; + __be16 event_group_record_offs; /* in bytes, must be 8 byte aligned */ + __be16 event_group_record_len; /* in bytes */ + + /* in bytes, offset from event_group_record */ + __be16 event_counter_offs; + + /* verified_state, unverified_state, caveat_state, broken_state, ... */ + __be32 flags; + + __be16 primary_group_ix; + __be16 group_count; + __be16 event_name_len; + __u8 remainder[]; + /* __u8 event_name[event_name_len - 2]; */ + /* __be16 event_description_len; */ + /* __u8 event_desc[event_description_len - 2]; */ + /* __be16 detailed_desc_len; */ + /* __u8 detailed_desc[detailed_desc_len - 2]; */ +} __packed; + #endif diff --git a/arch/powerpc/perf/hv-24x7-domains.h b/arch/powerpc/perf/hv-24x7-domains.h new file mode 100644 index 000..9c5c862 --- /dev/null +++ b/arch/powerpc/perf/hv-24x7-domains.h @@ -0,0 +1,19 @@ + +/* + * DOMAIN(name, num, index_kind, is_physical) + * + * @name: an all caps token, suitable for use in generating an enum member and + *appending to an event name in sysfs. + * @num: the number corresponding to the domain as given in documentation. We + * assume the catalog domain and the hcall domain have the same numbering + * (so far they do), but this may need to be changed in the future. + * @index_kind: a stringifiable token describing the meaning of the index within the + * given domain. Must fit the parsing rules of the perf sysfs api. + * @is_physical: true if the domain is physical, false otherwise (if virtual). + */ +DOMAIN(PHYSICAL_CHIP, 0x01, chip, true) +DOMAIN(PHYSICAL_CORE, 0x02, core, true) +DOMAIN(VIRTUAL_PROCESSOR_HOME_CORE, 0x03, vcpu, false) +DOMAIN(VIRTUAL_PROCESSOR_HOME_CHIP, 0x04, vcpu, false) +DOMAIN(VIRTUAL_PROCESSOR_HOME_NODE, 0x05, vcpu, false) +DOMAIN(VIRTUAL_PROCESSOR_REMOTE_NODE, 0x06, vcpu, false) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 9a7a830..c9b7c55 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -1,3 +1,4 @@ +#define DEBUG 1 /* * Hypervisor supplied 24x7 performance counter support * @@ -12,9 +13,13 @@ #define pr_fmt(fmt) hv-24x7: fmt +#include linux/byteorder.h #include linux/perf_event.h +#include linux/rbtree.h #include linux/module.h #include linux/slab.h +#include linux/vmalloc.h + #include asm/firmware.h #include asm/hvcall.h #include asm/io.h @@ -23,6 +28,66 @@ #include hv-24x7-catalog.h #include hv-common.h +static const char *domain_to_index_string(unsigned domain) +{ + switch (domain) { +#define DOMAIN(n, v, x, c) \ + case HV_PERF_DOMAIN_##n:\ + return #x; +#include hv-24x7-domains.h +#undef DOMAIN + default: + WARN(1, unknown domain %d\n, domain); + return UNKNOWN_DOMAIN_INDEX_STRING; + } +} + +static const char *event_domain_suffix(unsigned domain) +{ + switch (domain) { +#define DOMAIN(n, v, x, c) \ + case HV_PERF_DOMAIN_##n:\ + return __ #n; +#include hv-24x7-domains.h +#undef DOMAIN + default: + WARN(1, unknown domain %d\n, domain); + return __UNKNOWN_DOMAIN_SUFFIX; + } +} + +static bool domain_is_valid(unsigned domain) +{ + switch (domain) { +#define DOMAIN(n, v, x, c) \ + case HV_PERF_DOMAIN_##n:\ + /* fall through */ +#include hv-24x7-domains.h +#undef DOMAIN + return true; + default: + return false; + } +} + +static bool is_physical_domain
[PATCH 13/16] powerpc/perf/hv-24x7: Documentaion for new sysfs entries which expose descriptions
CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- .../testing/sysfs-bus-event_source-devices-hv_24x7 | 22 ++ 1 file changed, 22 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 index e78ee79..5b501d7 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 @@ -21,3 +21,25 @@ Contact: Cody P Schafer c...@linux.vnet.ibm.com Description: Exposes the version field of the 24x7 catalog. This is also extractable from the provided binary catalog sysfs entry. + +What: /sys/bus/event_source/devices/hv_24x7/event_descs/event-name +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + Provides the description of a particular event as provided by + the firmware. If firmware does not provide a description, no + file will be created. + + Note that the event-name lacks the domain suffix appended for + events in the events/ dir. + +What: /sys/bus/event_source/devices/hv_24x7/event_long_descs/event-name +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + Provides the long description of a particular event as + provided by the firmware. If firmware does not provide a + description, no file will be created. + + Note that the event-name lacks the domain suffix appended for + events in the events/ dir. -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 14/16] perf: add PMU_EVENT_ATTR_STRING() helper
Helper for constructing static struct perf_pmu_events_attr s. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- include/linux/perf_event.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 6c1d6dd..1313171 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -876,6 +876,13 @@ static struct perf_pmu_events_attr _var = { \ .id = _id, \ }; +#define PMU_EVENT_ATTR_STRING(_name, _var, _value) \ +static struct perf_pmu_events_attr _var = {\ + .attr = __ATTR(_name, 0444, perf_event_sysfs_show, NULL), \ + .event_str = _value,\ +}; + + #define PMU_FORMAT_ATTR(_name, _format) \ static ssize_t \ _name##_show(struct device *dev, \ -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 16/16] powerpc/perf/hv-gpci: add the remaining gpci requests
Add the remaining gpci requests that contain counters suitable for use by perf. Omit those that don't contain any counters (but note their ommision). CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- arch/powerpc/perf/hv-gpci-requests.h | 179 +++ 1 file changed, 179 insertions(+) diff --git a/arch/powerpc/perf/hv-gpci-requests.h b/arch/powerpc/perf/hv-gpci-requests.h index 0dfc4d9..af3b73c 100644 --- a/arch/powerpc/perf/hv-gpci-requests.h +++ b/arch/powerpc/perf/hv-gpci-requests.h @@ -65,6 +65,33 @@ REQUEST(__count(0, 8, processor_time_in_timebase_cycles) ) #include I(REQUEST_END) +#define REQUEST_NAME entitled_capped_uncapped_donated_idle_timebase_by_partition +#define REQUEST_NUM 0x20 +#define REQUEST_IDX_KIND sibling_part_id +#include I(REQUEST_BEGIN) +REQUEST(__field(0, 8, partition_id) + __count(0x8,8, entitled_cycles) + __count(0x10, 8, consumed_capped_cycles) + __count(0x18, 8, consumed_uncapped_cycles) + __count(0x20, 8, cycles_donated) + __count(0x28, 8, purr_idle_cycles) +) +#include I(REQUEST_END) + +/* + * Not avaliable for counter_info_version = 0x8, use + * run_instruction_cycles_by_partition(0x100) instead. + */ +#define REQUEST_NAME run_instructions_run_cycles_by_partition +#define REQUEST_NUM 0x30 +#define REQUEST_IDX_KIND sibling_part_id +#include I(REQUEST_BEGIN) +REQUEST(__field(0, 8, partition_id) + __count(0x8,8, instructions_completed) + __count(0x10, 8, cycles) +) +#include I(REQUEST_END) + #define REQUEST_NAME system_performance_capabilities #define REQUEST_NUM 0x40 #define REQUEST_IDX_KIND M1 @@ -75,5 +102,157 @@ REQUEST(__field(0, 1, perf_collect_privileged) ) #include I(REQUEST_END) +#define REQUEST_NAME processor_bus_utilization_abc_links +#define REQUEST_NUM 0x50 +#define REQUEST_IDX_KIND hw_chip_id +#include I(REQUEST_BEGIN) +REQUEST(__field(0, 4, hw_chip_id) + __array(0x4,0xC,reserved1) + __count(0x10, 8, total_link_cycles) + __count(0x18, 8, idle_cycles_for_a_link) + __count(0x20, 8, idle_cycles_for_b_link) + __count(0x28, 8, idle_cycles_for_c_link) + __array(0x30, 0x20, reserved2) +) +#include I(REQUEST_END) + +#define REQUEST_NAME processor_bus_utilization_wxyz_links +#define REQUEST_NUM 0x60 +#define REQUEST_IDX_KIND hw_chip_id +#include I(REQUEST_BEGIN) +REQUEST(__field(0, 4, hw_chip_id) + __array(0x4,0xC,reserved1) + __count(0x10, 8, total_link_cycles) + __count(0x18, 8, idle_cycles_for_w_link) + __count(0x20, 8, idle_cycles_for_x_link) + __count(0x28, 8, idle_cycles_for_y_link) + __count(0x30, 8, idle_cycles_for_z_link) + __array(0x38, 0x28, reserved2) +) +#include I(REQUEST_END) + +#define REQUEST_NAME processor_bus_utilization_gx_links +#define REQUEST_NUM 0x70 +#define REQUEST_IDX_KIND hw_chip_id +#include I(REQUEST_BEGIN) +REQUEST(__field(0, 4, hw_chip_id) + __array(0x4,0xC,reserved1) + __count(0x10, 8, gx0_in_address_cycles) + __count(0x18, 8, gx0_in_data_cycles) + __count(0x20, 8, gx0_in_retries) + __count(0x28, 8, gx0_in_bus_cycles) + __count(0x30, 8, gx0_in_cycles_total) + __count(0x38, 8, gx0_out_address_cycles) + __count(0x40, 8, gx0_out_data_cycles) + __count(0x48, 8, gx0_out_retries) + __count(0x50, 8, gx0_out_bus_cycles) + __count(0x58, 8, gx0_out_cycles_total) + __count(0x60, 8, gx1_in_address_cycles) + __count(0x68, 8, gx1_in_data_cycles) + __count(0x70, 8, gx1_in_retries) + __count(0x78, 8, gx1_in_bus_cycles) + __count(0x80, 8, gx1_in_cycles_total) + __count(0x88, 8, gx1_out_address_cycles) + __count(0x90, 8, gx1_out_data_cycles) + __count(0x98, 8, gx1_out_retries) + __count(0xA0, 8, gx1_out_bus_cycles) + __count(0xA8, 8, gx1_out_cycles_total) +) +#include I(REQUEST_END) + +#define REQUEST_NAME processor_bus_utilization_mc_links +#define REQUEST_NUM 0x80 +#define REQUEST_IDX_KIND hw_chip_id +#include I(REQUEST_BEGIN) +REQUEST(__field(0, 4, hw_chip_id) + __array(0x4,0xC,reserved1) + __count(0x10, 8, mc0_frames) + __count(0x18, 8, mc0_reads) + __count(0x20, 8, mc0_write) + __count(0x28, 8, mc0_total_cycles) + __count(0x30, 8, mc1_frames) + __count(0x38, 8, mc1_reads) + __count(0x40, 8, mc1_writes) + __count(0x48, 8, mc1_total_cycles) +) +#include I(REQUEST_END) + +/* Processor_config (0x90) skipped, no counters */ +/* Current_processor_frequency
[PATCH 15/16] powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
This adds (in req-gen/) a framework for defining gpci counter requests. It uses macro magic similar to ftrace. Also convert the existing hv-gpci request structures and enum values to use the new framework (and adjust old users of the structs and enum values to cope with changes in naming). In exchange for this macro disaster, we get autogenerated event listing for GPCI in sysfs, build time field offset checking, and zero duplication of information about GPCI requests. CC: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Signed-off-by: Cody P Schafer d...@codyps.com --- arch/powerpc/perf/hv-common.c | 10 +- arch/powerpc/perf/hv-gpci-requests.h | 79 +++ arch/powerpc/perf/hv-gpci.c| 8 ++ arch/powerpc/perf/hv-gpci.h| 37 +++ arch/powerpc/perf/req-gen/_begin.h | 13 +++ arch/powerpc/perf/req-gen/_clear.h | 5 + arch/powerpc/perf/req-gen/_end.h | 4 + arch/powerpc/perf/req-gen/_request-begin.h | 15 +++ arch/powerpc/perf/req-gen/_request-end.h | 8 ++ arch/powerpc/perf/req-gen/perf.h | 155 + 10 files changed, 304 insertions(+), 30 deletions(-) create mode 100644 arch/powerpc/perf/hv-gpci-requests.h create mode 100644 arch/powerpc/perf/req-gen/_begin.h create mode 100644 arch/powerpc/perf/req-gen/_clear.h create mode 100644 arch/powerpc/perf/req-gen/_end.h create mode 100644 arch/powerpc/perf/req-gen/_request-begin.h create mode 100644 arch/powerpc/perf/req-gen/_request-end.h create mode 100644 arch/powerpc/perf/req-gen/perf.h diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c index 47e02b3..7dce8f10 100644 --- a/arch/powerpc/perf/hv-common.c +++ b/arch/powerpc/perf/hv-common.c @@ -9,13 +9,13 @@ unsigned long hv_perf_caps_get(struct hv_perf_caps *caps) unsigned long r; struct p { struct hv_get_perf_counter_info_params params; - struct cv_system_performance_capabilities caps; + struct hv_gpci_system_performance_capabilities caps; } __packed __aligned(sizeof(uint64_t)); struct p arg = { .params = { .counter_request = cpu_to_be32( - CIR_SYSTEM_PERFORMANCE_CAPABILITIES), + HV_GPCI_system_performance_capabilities), .starting_index = cpu_to_be32(-1), .counter_info_version_in = 0, } @@ -31,9 +31,9 @@ unsigned long hv_perf_caps_get(struct hv_perf_caps *caps) caps-version = arg.params.counter_info_version_out; caps-collect_privileged = !!arg.caps.perf_collect_privileged; - caps-ga = !!(arg.caps.capability_mask CV_CM_GA); - caps-expanded = !!(arg.caps.capability_mask CV_CM_EXPANDED); - caps-lab = !!(arg.caps.capability_mask CV_CM_LAB); + caps-ga = !!(arg.caps.capability_mask HV_GPCI_CM_GA); + caps-expanded = !!(arg.caps.capability_mask HV_GPCI_CM_EXPANDED); + caps-lab = !!(arg.caps.capability_mask HV_GPCI_CM_LAB); return r; } diff --git a/arch/powerpc/perf/hv-gpci-requests.h b/arch/powerpc/perf/hv-gpci-requests.h new file mode 100644 index 000..0dfc4d9 --- /dev/null +++ b/arch/powerpc/perf/hv-gpci-requests.h @@ -0,0 +1,79 @@ + +#include req-gen/_begin.h + +/* + * Based on the document getPerfCountInfo v1.07 + */ + +/* this needs to be -1 encoded in hex suitable for parsing by tools/perf. */ +#define M1 0x + +/* + * #define REQUEST_NAME counter_request_name + * #define REQUEST_NUM r_num + * #define REQUEST_IDX_KIND starting_index_kind + * #include I(REQUEST_BEGIN) + * REQUEST( + * __field(...) + * __field(...) + * __array(...) + * __count(...) + * ) + * #include I(REQUEST_END) + * + * - starting_index_kind is one of: + * M1: must be -1 + * chip_id: hardware chip id or -1 for current hw chip + * phys_processor_idx: + * + * __count(offset, bytes, name): + * a counter that should be exposed via perf + * __field(offset, bytes, name) + * a normal field + * __array(offset, bytes, name) + * an array of bytes + * + * + * @bytes for __count, and __field _must_ be a numeral token + * in decimal, not an expression and not in hex. + * + * + * TODO: + * - expose secondary index (if any counter ever uses it, only 0xA0 + * appears to use it right now, and it doesn't have any counters) + * - embed versioning info + * - include counter descriptions + */ +#define REQUEST_NAME dispatch_timebase_by_processor +#define REQUEST_NUM 0x10 +#define REQUEST_IDX_KIND phys_processor_idx +#include I(REQUEST_BEGIN) +REQUEST(__count(0, 8, processor_time_in_timebase_cycles) + __field(0x8,4, hw_processor_id) + __field(0xC,2, owning_part_id) + __field(0xE,1, processor_state) + __field(0xF,1, version
Re: [PATCH v4 09/11] powerpc/perf: add support for the hv 24x7 interface
On 05/22/2014 01:19 AM, Ian Munsie wrote: Hi Cody, I just tried building this with gcc 4.5, which failed with the following warning (treated as an error): cc1: warnings being treated as errors arch/powerpc/perf/hv-24x7.c: In function 'single_24x7_request': arch/powerpc/perf/hv-24x7.c:346:1: error: the frame size of 8192 bytes is larger than 2048 bytes make[3]: *** [arch/powerpc/perf/hv-24x7.o] Error 1 make[2]: *** [arch/powerpc/perf] Error 2 My .config has CONFIG_FRAME_WARN=2048 (default on 64bit), but the alignment constraints in this function may require 8K on the stack - possibly a bit large? Yep, it is a bit large. In other places in hv-24x7 that use similar firmware interfaces (with similar alignment requirements), I've used a kmem_cache (hv_page_cache). Testing out a patch that uses that here as well. Notably for some reason this warning no longer seems to trigger on gcc 4.8 (or at least somewhere between 4.5-4.8), though the assembly does still show it aligning the buffers. That's a bit concerning (and might be why I didn't pick it up, using gcc 4.9.0 over here). Looking at the gcc docs, it seems to indicate that alloca() and VLAs aren't counted for -Wframe-larger-than. Perhaps gcc decided to move locally defined structures with alignment requirements into that same bucket? (while size of the structures is statically determinable, the stack consumption due to alignment is [to some degree] variable). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations
Ian pointed out the use of __aligned(4096) caused rather large stack consumption in single_24x7_request(), so use the kmem_cache hv_page_cache (which we've already got set up for other allocations) insead of allocating locally. Reported-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 52 - 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index e0766b8..9a7a830 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -294,7 +294,7 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, u16 lpar, u64 *res, bool success_expected) { - unsigned long ret; + unsigned long ret = -ENOMEM; /* * request_buffer and result_buffer are not required to be 4k aligned, @@ -304,7 +304,27 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, struct reqb { struct hv_24x7_request_buffer buf; struct hv_24x7_request req; - } __packed __aligned(4096) request_buffer = { + } __packed *request_buffer; + struct resb { + struct hv_24x7_data_result_buffer buf; + struct hv_24x7_result res; + struct hv_24x7_result_element elem; + __be64 result; + } __packed *result_buffer; + + BUILD_BUG_ON(sizeof(*request_buffer) 4096); + BUILD_BUG_ON(sizeof(*result_buffer) 4096); + + request_buffer = kmem_cache_alloc(hv_page_cache, GFP_USER); + + if (!request_buffer) + goto out_reqb; + + result_buffer = kmem_cache_zalloc(hv_page_cache, GFP_USER); + if (!result_buffer) + goto out_resb; + + *request_buffer = (struct reqb) { .buf = { .interface_version = HV_24X7_IF_VERSION_CURRENT, .num_requests = 1, @@ -320,28 +340,30 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, } }; - struct resb { - struct hv_24x7_data_result_buffer buf; - struct hv_24x7_result res; - struct hv_24x7_result_element elem; - __be64 result; - } __packed __aligned(4096) result_buffer = {}; - ret = plpar_hcall_norets(H_GET_24X7_DATA, - virt_to_phys(request_buffer), sizeof(request_buffer), - virt_to_phys(result_buffer), sizeof(result_buffer)); + virt_to_phys(request_buffer), sizeof(*request_buffer), + virt_to_phys(result_buffer), sizeof(*result_buffer)); if (ret) { if (success_expected) pr_err_ratelimited(hcall failed: %d %#x %#x %d = 0x%lx (%ld) detail=0x%x failing ix=%x\n, domain, offset, ix, lpar, ret, ret, - result_buffer.buf.detailed_rc, - result_buffer.buf.failing_request_ix); - return ret; + result_buffer-buf.detailed_rc, + result_buffer-buf.failing_request_ix); + goto out_hcall; } - *res = be64_to_cpu(result_buffer.result); + *res = be64_to_cpu(result_buffer-result); + kfree(result_buffer); + kfree(request_buffer); + return ret; + +out_hcall: + kfree(result_buffer); +out_resb: + kfree(request_buffer); +out_reqb: return ret; } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations
On 05/22/2014 03:38 PM, Stephen Rothwell wrote: Hi Cody, On Thu, 22 May 2014 15:29:08 -0700 Cody P Schafer c...@linux.vnet.ibm.com wrote: - *res = be64_to_cpu(result_buffer.result); + *res = be64_to_cpu(result_buffer-result); + kfree(result_buffer); + kfree(request_buffer); + return ret; Why not just fall through here by removing the above 3 lines? No reason except me not noticing it. + +out_hcall: + kfree(result_buffer); +out_resb: + kfree(request_buffer); +out_reqb: return ret; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations
Ian pointed out the use of __aligned(4096) caused rather large stack consumption in single_24x7_request(), so use the kmem_cache hv_page_cache (which we've already got set up for other allocations) insead of allocating locally. Reported-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- In v2: - remove duplicate exit path arch/powerpc/perf/hv-24x7.c | 48 +++-- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index e0766b8..998863b 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -294,7 +294,7 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, u16 lpar, u64 *res, bool success_expected) { - unsigned long ret; + unsigned long ret = -ENOMEM; /* * request_buffer and result_buffer are not required to be 4k aligned, @@ -304,7 +304,27 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, struct reqb { struct hv_24x7_request_buffer buf; struct hv_24x7_request req; - } __packed __aligned(4096) request_buffer = { + } __packed *request_buffer; + struct resb { + struct hv_24x7_data_result_buffer buf; + struct hv_24x7_result res; + struct hv_24x7_result_element elem; + __be64 result; + } __packed *result_buffer; + + BUILD_BUG_ON(sizeof(*request_buffer) 4096); + BUILD_BUG_ON(sizeof(*result_buffer) 4096); + + request_buffer = kmem_cache_alloc(hv_page_cache, GFP_USER); + + if (!request_buffer) + goto out_reqb; + + result_buffer = kmem_cache_zalloc(hv_page_cache, GFP_USER); + if (!result_buffer) + goto out_resb; + + *request_buffer = (struct reqb) { .buf = { .interface_version = HV_24X7_IF_VERSION_CURRENT, .num_requests = 1, @@ -320,28 +340,26 @@ static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix, } }; - struct resb { - struct hv_24x7_data_result_buffer buf; - struct hv_24x7_result res; - struct hv_24x7_result_element elem; - __be64 result; - } __packed __aligned(4096) result_buffer = {}; - ret = plpar_hcall_norets(H_GET_24X7_DATA, - virt_to_phys(request_buffer), sizeof(request_buffer), - virt_to_phys(result_buffer), sizeof(result_buffer)); + virt_to_phys(request_buffer), sizeof(*request_buffer), + virt_to_phys(result_buffer), sizeof(*result_buffer)); if (ret) { if (success_expected) pr_err_ratelimited(hcall failed: %d %#x %#x %d = 0x%lx (%ld) detail=0x%x failing ix=%x\n, domain, offset, ix, lpar, ret, ret, - result_buffer.buf.detailed_rc, - result_buffer.buf.failing_request_ix); - return ret; + result_buffer-buf.detailed_rc, + result_buffer-buf.failing_request_ix); + goto out_hcall; } - *res = be64_to_cpu(result_buffer.result); + *res = be64_to_cpu(result_buffer-result); +out_hcall: + kfree(result_buffer); +out_resb: + kfree(request_buffer); +out_reqb: return ret; } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations
On 05/22/2014 04:49 PM, Stephen Rothwell wrote: Hi Cody, On Thu, 22 May 2014 15:44:25 -0700 Cody P Schafer c...@linux.vnet.ibm.com wrote: if (ret) { if (success_expected) pr_err_ratelimited(hcall failed: %d %#x %#x %d = 0x%lx (%ld) detail=0x%x failing ix=%x\n, domain, offset, ix, lpar, ret, ret, - result_buffer.buf.detailed_rc, - result_buffer.buf.failing_request_ix); - return ret; + result_buffer-buf.detailed_rc, + result_buffer-buf.failing_request_ix); + goto out_hcall; } - *res = be64_to_cpu(result_buffer.result); + *res = be64_to_cpu(result_buffer-result); not a biggie, but this last bit could be (remove the goto out_hcall and teh label and then) } else { *res = be64_to_cpu(result_buffer-result); } I've got a slight preference toward keeping it as is, which lets all of the non-error path code stay outside of if/else blocks (and the error handling is kept ever so slightly more consistent). +out_hcall: + kfree(result_buffer); +out_resb: + kfree(request_buffer); +out_reqb: return ret; } otherwise looks good to me. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/pseries: relocate config DTL so KConfig nests properly
On 05/12/2014 11:23 PM, Michael Neuling wrote: powerpc/pseries: relocate config DTL so KConfig nests properly I don't know what that means. Can you describe it in more detail? So the config DTL refers to the configuration entry. The nests properly refers to the indent that 'make menuconfig' shows when a config-option that depends on the config-option proceeding it. In this case, moving config DTL up so it is below config PPC_SPLPAR means that menuconfig will show config DTL nicely indented right below config PPC_SPLPAR when PPC_SPLPAR is enabled. To contrast that, right now if I enable PPC_SPLPAR in menuconfig, all I can immediately tell is that something showed up further down the list where I wasn't looking, and I end up having to toggle the option a few times to figure out what showed up, or look at the KConfig to find out that config DTL depends on config PPC_SPLPAR. Essentially, this enables menuconfig to provide a visual hint about the dependencies between options. Mikey On Mon, 2014-05-12 at 20:09 -0700, Cody P Schafer wrote: Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/Kconfig | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 2cb8b77..e00dd4d 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -33,6 +33,16 @@ config PPC_SPLPAR processors, that is, which share physical processors between two or more partitions. +config DTL + bool Dispatch Trace Log + depends on PPC_SPLPAR DEBUG_FS + help + SPLPAR machines can log hypervisor preempt dispatch events to a + kernel buffer. Saying Y here will enable logging these events, + which are accessible through a debugfs file. + + Say N if you are unsure. + config PSERIES_MSI bool depends on PCI_MSI PPC_PSERIES EEH @@ -122,13 +132,3 @@ config HV_PERF_CTRS systems. 24x7 is available on Power 8 systems. If unsure, select Y. - -config DTL - bool Dispatch Trace Log - depends on PPC_SPLPAR DEBUG_FS - help - SPLPAR machines can log hypervisor preempt dispatch events to a - kernel buffer. Saying Y here will enable logging these events, - which are accessible through a debugfs file. - - Say N if you are unsure. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/pseries: relocate config DTL so KConfig nests properly
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/Kconfig | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 2cb8b77..e00dd4d 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -33,6 +33,16 @@ config PPC_SPLPAR processors, that is, which share physical processors between two or more partitions. +config DTL + bool Dispatch Trace Log + depends on PPC_SPLPAR DEBUG_FS + help + SPLPAR machines can log hypervisor preempt dispatch events to a + kernel buffer. Saying Y here will enable logging these events, + which are accessible through a debugfs file. + + Say N if you are unsure. + config PSERIES_MSI bool depends on PCI_MSI PPC_PSERIES EEH @@ -122,13 +132,3 @@ config HV_PERF_CTRS systems. 24x7 is available on Power 8 systems. If unsure, select Y. - -config DTL - bool Dispatch Trace Log - depends on PPC_SPLPAR DEBUG_FS - help - SPLPAR machines can log hypervisor preempt dispatch events to a - kernel buffer. Saying Y here will enable logging these events, - which are accessible through a debugfs file. - - Say N if you are unsure. -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 6/6] powerpc/perf/hv-24x7: catalog version number is be64, not be32
On 04/27/2014 09:47 PM, Benjamin Herrenschmidt wrote: On Tue, 2014-04-15 at 10:10 -0700, Cody P Schafer wrote: The catalog version number was changed from a be32 (with proceeding 32bits of padding) to a be64, update the code to treat it as a be64 Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com -- Have you tested this ? It doesn't build for me: arch/powerpc/perf/hv-24x7.c: In function 'catalog_read': arch/powerpc/perf/hv-24x7.c:223:3: error: format '%d' expects argument of type 'int', but argument 2 has type 'uint64_t' [-Werror=format] cc1: all warnings being treated as errors I have, and I wasn't initially sure how I managed to miss that warning-as-error. On examination: My config (for some reason) has CONFIG_PPC_DISABLE_WERROR=y set (probably because it's a variation of a distro config). Must have been piping the warnings to a file and forgotten to check the file. I'll fix that up in my tree. Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/6] powerpc/perf/hv_{gpci,24x7}: fixes
- 24x7 and gpci probing now uses pr_debug() and doesn't pad to 80 characters - Catalog access is fixed for LE kernels - remove c99 feature sparse doesn't like - 1 device attr made static Cody P Schafer (6): powerpc/perf/hv_24x7: probe errors changed to pr_debug(), padding fixed powerpc/perf/hv_gpci: probe failures use pr_debug(), and padding reduced powerpc/perf/hv-gpci: make device attr static powerpc/perf/hv-24x7: use (unsigned long) not (u32) values when calling plpar_hcall_norets() powerpc/perf/hv-24x7: remove [static 4096], sparse chokes on it powerpc/perf/hv-24x7: catalog version number is be64, not be32 arch/powerpc/perf/hv-24x7.c | 30 +- arch/powerpc/perf/hv-gpci.c | 6 +++--- 2 files changed, 24 insertions(+), 12 deletions(-) -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/6] powerpc/perf/hv-gpci: make device attr static
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 8fee1dc..c9d399a 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -78,7 +78,7 @@ static ssize_t kernel_version_show(struct device *dev, return sprintf(page, 0x%x\n, COUNTER_INFO_VERSION_CURRENT); } -DEVICE_ATTR_RO(kernel_version); +static DEVICE_ATTR_RO(kernel_version); HV_CAPS_ATTR(version, 0x%x\n); HV_CAPS_ATTR(ga, %d\n); HV_CAPS_ATTR(expanded, %d\n); -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 5/6] powerpc/perf/hv-24x7: remove [static 4096], sparse chokes on it
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 3e8f60a..95a67f8 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -170,7 +170,7 @@ static unsigned long h_get_24x7_catalog_page_(unsigned long phys_4096, index); } -static unsigned long h_get_24x7_catalog_page(char page[static 4096], +static unsigned long h_get_24x7_catalog_page(char page[], u32 version, u32 index) { return h_get_24x7_catalog_page_(virt_to_phys(page), -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 4/6] powerpc/perf/hv-24x7: use (unsigned long) not (u32) values when calling plpar_hcall_norets()
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index f5bca73..3e8f60a 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -155,16 +155,28 @@ static ssize_t read_offset_data(void *dest, size_t dest_len, return copy_len; } -static unsigned long h_get_24x7_catalog_page(char page[static 4096], -u32 version, u32 index) +static unsigned long h_get_24x7_catalog_page_(unsigned long phys_4096, + unsigned long version, + unsigned long index) { - WARN_ON(!IS_ALIGNED((unsigned long)page, 4096)); + pr_devel(h_get_24x7_catalog_page(0x%lx, %lu, %lu), + phys_4096, + version, + index); + WARN_ON(!IS_ALIGNED(phys_4096, 4096)); return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE, - virt_to_phys(page), + phys_4096, version, index); } +static unsigned long h_get_24x7_catalog_page(char page[static 4096], +u32 version, u32 index) +{ + return h_get_24x7_catalog_page_(virt_to_phys(page), + version, index); +} + static ssize_t catalog_read(struct file *filp, struct kobject *kobj, struct bin_attribute *bin_attr, char *buf, loff_t offset, size_t count) -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/6] powerpc/perf/hv_24x7: probe errors changed to pr_debug(), padding fixed
fixup for powerpc/perf: Add support for the hv 24x7 interface Makes the not enabled message less awful (and hides it in most cases). Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 297c9105..f5bca73 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -485,13 +485,13 @@ static int hv_24x7_init(void) struct hv_perf_caps caps; if (!firmware_has_feature(FW_FEATURE_LPAR)) { - pr_info(not a virtualized system, not enabling\n); + pr_debug(not a virtualized system, not enabling\n); return -ENODEV; } hret = hv_perf_caps_get(caps); if (hret) { - pr_info(could not obtain capabilities, error 0x%80lx, not enabling\n, + pr_debug(could not obtain capabilities, not enabling, rc=%ld\n, hret); return -ENODEV; } -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/6] powerpc/perf/hv_gpci: probe failures use pr_debug(), and padding reduced
fixup for powerpc/perf: Add support for the hv gpci (get performance counter info) interface. Makes the not enabled message less awful (and hidden unless debugging). Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 278ba7b..8fee1dc 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -273,13 +273,13 @@ static int hv_gpci_init(void) struct hv_perf_caps caps; if (!firmware_has_feature(FW_FEATURE_LPAR)) { - pr_info(not a virtualized system, not enabling\n); + pr_debug(not a virtualized system, not enabling\n); return -ENODEV; } hret = hv_perf_caps_get(caps); if (hret) { - pr_info(could not obtain capabilities, error 0x%80lx, not enabling\n, + pr_debug(could not obtain capabilities, not enabling, rc=%ld\n, hret); return -ENODEV; } -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 6/6] powerpc/perf/hv-24x7: catalog version number is be64, not be32
The catalog version number was changed from a be32 (with proceeding 32bits of padding) to a be64, update the code to treat it as a be64 Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 95a67f8..9d4badc 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -171,7 +171,7 @@ static unsigned long h_get_24x7_catalog_page_(unsigned long phys_4096, } static unsigned long h_get_24x7_catalog_page(char page[], -u32 version, u32 index) +u64 version, u32 index) { return h_get_24x7_catalog_page_(virt_to_phys(page), version, index); @@ -185,7 +185,7 @@ static ssize_t catalog_read(struct file *filp, struct kobject *kobj, ssize_t ret = 0; size_t catalog_len = 0, catalog_page_len = 0, page_count = 0; loff_t page_offset = 0; - uint32_t catalog_version_num = 0; + uint64_t catalog_version_num = 0; void *page = kmem_cache_alloc(hv_page_cache, GFP_USER); struct hv_24x7_catalog_page_0 *page_0 = page; if (!page) @@ -197,7 +197,7 @@ static ssize_t catalog_read(struct file *filp, struct kobject *kobj, goto e_free; } - catalog_version_num = be32_to_cpu(page_0-version); + catalog_version_num = be64_to_cpu(page_0-version); catalog_page_len = be32_to_cpu(page_0-length); catalog_len = catalog_page_len * 4096; @@ -255,7 +255,7 @@ e_free: \ static DEVICE_ATTR_RO(_name) PAGE_0_ATTR(catalog_version, %lld\n, - (unsigned long long)be32_to_cpu(page_0-version)); + (unsigned long long)be64_to_cpu(page_0-version)); PAGE_0_ATTR(catalog_len, %lld\n, (unsigned long long)be32_to_cpu(page_0-length) * 4096); static BIN_ATTR_RO(catalog, 0/* real length varies */); -- 1.9.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] fixup: powerpc/perf: Add support for the hv 24x7 interface
Make the not enabled message less awful. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 297c9105..3246ea2 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -491,7 +491,7 @@ static int hv_24x7_init(void) hret = hv_perf_caps_get(caps); if (hret) { - pr_info(could not obtain capabilities, error 0x%80lx, not enabling\n, + pr_info(could not obtain capabilities, not enabling (%ld)\n, hret); return -ENODEV; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] fixup: powerpc/perf: Add support for the hv gpci (get performance counter info) interface
Make the not enabled message less awful. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c index 278ba7b..f6c471d 100644 --- a/arch/powerpc/perf/hv-gpci.c +++ b/arch/powerpc/perf/hv-gpci.c @@ -279,7 +279,7 @@ static int hv_gpci_init(void) hret = hv_perf_caps_get(caps); if (hret) { - pr_info(could not obtain capabilities, error 0x%80lx, not enabling\n, + pr_info(could not obtain capabilities, not enabling (%ld)\n, hret); return -ENODEV; } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] powerpc/perf: fixup 2 patches from the 24x7 series
mpe: these are fixups for 2 patches already in your merge tree (and in benh's next branch). f3e622941a7cec587c00c0d17ea31514457c63c8 powerpc/perf: Add support for the hv 24x7 interface edd354ea4a6774bf9f380b0acf30e699070f4e8a powerpc/perf: Add support for the hv gpci (get performance counter info) interface The only change is to a pr_info() printed when the interface is not detected. Anton: I'm hesitant to switch these to pr_debug() as they are the only way users expecting these PMUs to exist to tell why the kernel decided they didn't have them. As a result, I've kept them as pr_info() instead of converting to pr_debug(). Cody P Schafer (2): fixup: powerpc/perf: Add support for the hv 24x7 interface fixup: powerpc/perf: Add support for the hv gpci (get performance counter info) interface arch/powerpc/perf/hv-24x7.c | 2 +- arch/powerpc/perf/hv-gpci.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 09/11] powerpc/perf: add support for the hv 24x7 interface
On 03/25/2014 03:43 AM, Anton Blanchard wrote: Hi Cody, hv-24x7: could not obtain capabilities, error 0x fffe, not enabling hv-gpci: could not obtain capabilities, error 0x fffe, not enabling + pr_info(could not obtain capabilities, error 0x%80lx, not enabling\n, That's a lot of padding :) I think this should also be a pr_debug, considering this is not relevant to most ppc64 boxes. I'm fine with that. It should probably be 0x%08lx not 0x%80lx, not sure when I screwed that up. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 09/11] powerpc/perf: add support for the hv 24x7 interface
On 03/25/2014 03:43 AM, Anton Blanchard wrote: Hi Cody, hv-24x7: could not obtain capabilities, error 0x fffe, not enabling hv-gpci: could not obtain capabilities, error 0x fffe, not enabling + pr_info(could not obtain capabilities, error 0x%80lx, not enabling\n, That's a lot of padding :) I think this should also be a pr_debug, considering this is not relevant to most ppc64 boxes. Yep, s/info/debug/ makes sense. The format should have been %08lx not %80lx, not sure when I screwed that up. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 02/11] perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
Add PMU_FORMAT_RANGE() and PMU_FORMAT_RANGE_RESERVED() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 23 +++ 1 file changed, 23 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..5c12009 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,27 @@ _name##_show(struct device *dev, \ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define format_max(name) FORMAT_MAX_(name)() +#define FORMAT_MAX_(name) format_##name##_max + +#define format_get(name, event) FORMAT_GET_(name)(event) +#define FORMAT_GET_(name) format_get_##name + +#define PMU_FORMAT_RANGE(name, attr_var, bit_start, bit_end) \ +PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) \ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end) + +#define PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) \ +static u64 FORMAT_MAX_(name)(void) \ +{ \ + BUILD_BUG_ON((bit_start bit_end) \ + || (bit_end = (sizeof(1ull) * 8)));\ + return (((1ull (bit_end - bit_start)) - 1) 1) + 1;\ +} \ +static u64 FORMAT_GET_(name)(struct perf_event *event) \ +{ \ + return (event-attr.attr_var (bit_start)) \ + format_max(name); \ +} + #endif /* _LINUX_PERF_EVENT_H */ -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 00/11] powerpc: Add support for Power Hypervisor supplied performance counters
These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain performance counters: gpci (get performance counter info) and 24x7. The counters supplied by these interfaces are continually counting and never need to be (and cannot be) disabled or enabled. They additionally do not generate any interrupts. This makes them in some regards similar to software counters, and as a result their implimentation shares some common code (which an initial patch exposes) with the sw counters. These 2 PMUs end up providing access to some cpu, core, and chip level counters not exposed via other interfaces, and additionally allow monitoring the performance of other lpars (guests) on the same host system. Because it provides access to core and chip level counters, this pair of PMUs could be thought of as powerpc's counterpart to x86's uncore events. GPCI is an interface that already exists on some power6 and power7 machines (depending on the fw version), but is rather in-flexible and code intensive to add additional counters to. The 24x7 interfaces currently are designed to co-exist with the gpci interface while replacing most of gpci's functionality on newer systems. Right now, the 24x7 code I've submitted uses the gpci calls to check if it has permission to access certain classes of counters. -- Since v3: - PMU_FORMAT_RANGE*() - add BUILD_BUG_ON() invalid bit indexes - rename event_get_##name(ev) to format_get(name, ev) [Michael Ellerman] - similarly, rename event_get_##name##_max() to format_max(name) [Michael Ellerman] - fix format_max() [Michael Ellerman] Since v2: - sysfs: create bin_attributes under the requested group is now in git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git driver-core-next with commit-id: aabaf4c2050d21d39fe11eec889c508e84d6a328 - Split hv-24x7.h catalog definition into hv-24x7-catalog.h - Remove unused 24x7 and gpci interface structures and enums (Michael Ellerman) - Update docs to point to an external source for the full catalog docs - Extend some of the patch changelogs (Peter Z) - Remove hrtimer usage and just extern the event_idx helper (now renamed) (Peter Z) - s/PMU_RANGE_ATTR/PMU_FORMAT_RANGE/ (and similar RESERVED rename) (Michael Ellerman) - hv_24x7: small clarifications in read_offset_data()'s comment - hv_gpci: remove h_gpci_event_read() and h_gpci_event_del(), call _stop and _update() directly (Michael Ellerman) - Kconfig relocation, dependency changes, and rewording (Scott Wood and Michael Ellerman) Since v1: - add a few attributes to hv_gpci and hv_24x7 that expose some info about the interfaces - so the attributes show up in the right place, fix bin_attr creation in sysfs groups. - move hv_gpci.h and hv_24x7.h interface headers into arch/powerpc/perf - fix bit ordering in hv_gpci.h - split out hv_perf_caps_get() and use it to probe for the interface before registering - ensure proper alignment of hypervisor args - add a few missing counter requests to hv_gpci.h - s/CIR_xxx/CIR_XXX/ in hv_gpci.h - s/modules_init/device_initcall/ - Don't set event-cpu, use the user provided one - remove the union of gpci events, just give the user 1024 bytes to play with - clarify some comments (the list of fw versions is now labeled) - provide and event_24x7_request() that wraps single_24x7_request() - probably some other small fixes I'm forgetting. Cody P Schafer (11): sysfs: create bin_attributes under the requested group perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus perf: provide a common perf_event_nop_0() for use with .event_idx powerpc: add hvcalls for 24x7 and gpci (get performance counter info) powerpc/perf: add hv_gpci interface header powerpc/perf: add 24x7 interface headers powerpc/perf: add a shared interface to get gpci version and capabilities powerpc/perf: add support for the hv gpci (get performance counter info) interface powerpc/perf: add support for the hv 24x7 interface powerpc/perf: add kconfig option for hypervisor provided counters powerpc/perf/hv_{gpci,24x7}: add documentation of device attributes .../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 + .../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++ arch/powerpc/include/asm/hvcall.h | 5 + arch/powerpc/perf/Makefile | 2 + arch/powerpc/perf/hv-24x7-catalog.h| 33 ++ arch/powerpc/perf/hv-24x7.c| 493 + arch/powerpc/perf/hv-24x7.h| 109 + arch/powerpc/perf/hv-common.c | 39 ++ arch/powerpc/perf/hv-common.h | 17 + arch/powerpc/perf/hv-gpci.c| 277 arch/powerpc/perf/hv-gpci.h| 73 +++ arch/powerpc/platforms/pseries/Kconfig | 12 + fs/sysfs/group.c
[PATCH v4 03/11] perf: provide a common perf_event_nop_0() for use with .event_idx
Rather an having every pmu that needs a function that just returns 0 for .event_idx define their own copy, reuse the one in kernel/events/core.c. Rename from perf_swevent_event_idx() because we're no longer using it for just software events. Naming is based on the perf_pmu_nop_*() functions. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 1 + kernel/events/core.c | 10 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 5c12009..23da668 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -560,6 +560,7 @@ extern void perf_pmu_migrate_context(struct pmu *pmu, extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); +extern int perf_event_nop_0(struct perf_event *event); struct perf_sample_data { u64 type; diff --git a/kernel/events/core.c b/kernel/events/core.c index fa0b2d4..16bf7c2 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event) return 0; } -static int perf_swevent_event_idx(struct perf_event *event) +int perf_event_nop_0(struct perf_event *event) { return 0; } @@ -5831,7 +5831,7 @@ static struct pmu perf_swevent = { .stop = perf_swevent_stop, .read = perf_swevent_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; #ifdef CONFIG_EVENT_TRACING @@ -5950,7 +5950,7 @@ static struct pmu perf_tracepoint = { .stop = perf_swevent_stop, .read = perf_swevent_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; static inline void perf_tp_register(void) @@ -6177,7 +6177,7 @@ static struct pmu perf_cpu_clock = { .stop = cpu_clock_event_stop, .read = cpu_clock_event_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; /* @@ -6257,7 +6257,7 @@ static struct pmu perf_task_clock = { .stop = task_clock_event_stop, .read = task_clock_event_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; static void perf_pmu_nop_void(struct pmu *pmu) -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/include/asm/hvcall.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index d8b600b..5dbbb29 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -274,6 +274,11 @@ /* Platform specific hcalls, used by KVM */ #define H_RTAS 0xf000 +/* Platform specific hcalls, provided by PHYP */ +#define H_GET_24X7_CATALOG_PAGE0xF078 +#define H_GET_24X7_DATA0xF07C +#define H_GET_PERF_COUNTER_INFO0xF080 + #ifndef __ASSEMBLY__ /** -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 01/11] sysfs: create bin_attributes under the requested group
bin_attributes created/updated in create_files() (such as those listed via (struct device).attribute_groups) were not placed under the specified group, and instead appeared in the base kobj directory. Fix this by making bin_attributes use creating code similar to normal attributes. A quick grep shows that no one is using bin_attrs in a named attribute group yet, so we can do this without breaking anything in usespace. Note that I do not add is_visible() support to bin_attributes, though that could be done as well. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org --- Currently in: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git driver-core-next with commit-id: aabaf4c2050d21d39fe11eec889c508e84d6a328 --- fs/sysfs/group.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c index 6b57938..aa04068 100644 --- a/fs/sysfs/group.c +++ b/fs/sysfs/group.c @@ -70,8 +70,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj, if (grp-bin_attrs) { for (bin_attr = grp-bin_attrs; *bin_attr; bin_attr++) { if (update) - sysfs_remove_bin_file(kobj, *bin_attr); - error = sysfs_create_bin_file(kobj, *bin_attr); + kernfs_remove_by_name(parent, + (*bin_attr)-attr.name); + error = sysfs_add_file_mode_ns(parent, + (*bin_attr)-attr, true, + (*bin_attr)-attr.mode, NULL); if (error) break; } -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 05/11] powerpc/perf: add hv_gpci interface header
H_GetPerformanceCounterInfo (refered to as hv_gpci or just gpci from here on) is an interface to retrieve specific performance counters and other data from the hypervisor. All outputs have a fixed format. This header only describes the portions of the interface that we plan on using in linux at this time. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.h | 73 + 1 file changed, 73 insertions(+) create mode 100644 arch/powerpc/perf/hv-gpci.h diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h new file mode 100644 index 000..b25f460 --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.h @@ -0,0 +1,73 @@ +#ifndef LINUX_POWERPC_PERF_HV_GPCI_H_ +#define LINUX_POWERPC_PERF_HV_GPCI_H_ + +#include linux/types.h + +/* From the document H_GetPerformanceCounterInfo Interface v1.07 */ + +/* H_GET_PERF_COUNTER_INFO argument */ +struct hv_get_perf_counter_info_params { + __be32 counter_request; /* I */ + __be32 starting_index; /* IO */ + __be16 secondary_index; /* IO */ + __be16 returned_values; /* O */ + __be32 detail_rc; /* O, only needed when called via *_norets() */ + + /* +* O, size each of counter_value element in bytes, only set for version +* = 0x3 +*/ + __be16 cv_element_size; + + /* I, 0 (zero) for versions 0x3 */ + __u8 counter_info_version_in; + + /* O, 0 (zero) if version 0x3. Must be set to 0 when making hcall */ + __u8 counter_info_version_out; + __u8 reserved[0xC]; + __u8 counter_value[]; +} __packed; + +/* + * counter info version = fw version/reference (spec version) + * + * 8 = power8 (1.07) + * [7 is skipped by spec 1.07] + * 6 = TLBIE (1.07) + * 5 = v7r7m0.phyp (1.05) + * [4 skipped] + * 3 = v7r6m0.phyp (?) + * [1,2 skipped] + * 0 = v7r{2,3,4}m0.phyp (?) + */ +#define COUNTER_INFO_VERSION_CURRENT 0x8 + +/* + * These determine the counter_value[] layout and the meaning of starting_index + * and secondary_index. + * + * Unless otherwise noted, @secondary_index is unused and ignored. + */ +enum counter_info_requests { + + /* GENERAL */ + + /* @starting_index: must be -1 (to refer to the current partition) +*/ + CIR_SYSTEM_PERFORMANCE_CAPABILITIES = 0X40, +}; + +struct cv_system_performance_capabilities { + /* If != 0, allowed to collect data from other partitions */ + __u8 perf_collect_privileged; + + /* These following are only valid if counter_info_version = 0x3 */ +#define CV_CM_GA (1 7) +#define CV_CM_EXPANDED (1 6) +#define CV_CM_LAB (1 5) + /* remaining bits are reserved */ + __u8 capability_mask; + __u8 reserved[0xE]; +} __packed; + +#endif -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 06/11] powerpc/perf: add 24x7 interface headers
24x7 (also called hv_24x7 or H_24X7) is an interface to obtain performance counters from the hypervisor. These counters do not have a fixed format/possition and are instead documented in a 24x7 Catalog, which is provided by the hypervisor (that interface is also documented paritialy in the included hv-24x7-catalog.h and fully in at https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ). The 24x7 data access is simply a copy operation into a 4 dimentional array of 64bit counters (from hypervisor to kernel memory). There is no interupt triggered on overflow, these are completely disjoint from the typical power pmu. This method of obtaining performance counters from the hypervisor is intended to paritialy replace the gpci interface. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7-catalog.h | 33 +++ arch/powerpc/perf/hv-24x7.h | 109 2 files changed, 142 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7-catalog.h create mode 100644 arch/powerpc/perf/hv-24x7.h diff --git a/arch/powerpc/perf/hv-24x7-catalog.h b/arch/powerpc/perf/hv-24x7-catalog.h new file mode 100644 index 000..21b19dd --- /dev/null +++ b/arch/powerpc/perf/hv-24x7-catalog.h @@ -0,0 +1,33 @@ +#ifndef LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_ +#define LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_ + +#include linux/types.h + +/* From document 24x7 Event and Group Catalog Formats Proposal v0.15 */ + +struct hv_24x7_catalog_page_0 { +#define HV_24X7_CATALOG_MAGIC 0x32347837 /* 24x7 in ASCII */ + __be32 magic; + __be32 length; /* In 4096 byte pages */ + __be64 version; /* XXX: arbitrary? what's the meaning/useage/purpose? */ + __u8 build_time_stamp[16]; /* MMDDHHMMSS\0\0 */ + __u8 reserved2[32]; + __be16 schema_data_offs; /* in 4096 byte pages */ + __be16 schema_data_len; /* in 4096 byte pages */ + __be16 schema_entry_count; + __u8 reserved3[2]; + __be16 event_data_offs; + __be16 event_data_len; + __be16 event_entry_count; + __u8 reserved4[2]; + __be16 group_data_offs; /* in 4096 byte pages */ + __be16 group_data_len; /* in 4096 byte pages */ + __be16 group_entry_count; + __u8 reserved5[2]; + __be16 formula_data_offs; /* in 4096 byte pages */ + __be16 formula_data_len; /* in 4096 byte pages */ + __be16 formula_entry_count; + __u8 reserved6[2]; +} __packed; + +#endif diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h new file mode 100644 index 000..720ebce --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.h @@ -0,0 +1,109 @@ +#ifndef LINUX_POWERPC_PERF_HV_24X7_H_ +#define LINUX_POWERPC_PERF_HV_24X7_H_ + +#include linux/types.h + +struct hv_24x7_request { + /* PHYSICAL domains require enabling via phyp/hmc. */ +#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01 +#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE 0x03 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP 0x04 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE 0x05 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06 + __u8 performance_domain; + __u8 reserved[0x1]; + + /* bytes to read starting at @data_offset. must be a multiple of 8 */ + __be16 data_size; + + /* +* byte offset within the perf domain to read from. must be 8 byte +* aligned +*/ + __be32 data_offset; + + /* +* only valid for VIRTUAL_PROCESSOR domains, ignored for others. +* -1 means current partition only +* Enabling via phyp/hmc required for non--1 values. 0 forbidden +* unless requestor is 0. +*/ + __be16 starting_lpar_ix; + + /* +* Ignored when @starting_lpar_ix == -1 +* Ignored when @performance_domain is not VIRTUAL_PROCESSOR_* +* -1 means infinite or all +*/ + __be16 max_num_lpars; + + /* chip, core, or virtual processor based on @performance_domain */ + __be16 starting_ix; + __be16 max_ix; +} __packed; + +struct hv_24x7_request_buffer { + /* 0 - ? */ + /* 1 - ? */ +#define HV_24X7_IF_VERSION_CURRENT 0x01 + __u8 interface_version; + __u8 num_requests; + __u8 reserved[0xE]; + struct hv_24x7_request requests[]; +} __packed; + +struct hv_24x7_result_element { + __be16 lpar_ix; + + /* +* represents the core, chip, or virtual processor based on the +* request's @performance_domain +*/ + __be16 domain_ix; + + /* -1 if @performance_domain does not refer to a virtual processor */ + __be32 lpar_cfg_instance_id; + + /* size = @result_element_data_size of cointaining result. */ + __u8 element_data[]; +} __packed; + +struct hv_24x7_result { + __u8 result_ix; + + /* +* 0 = not all
[PATCH v4 07/11] powerpc/perf: add a shared interface to get gpci version and capabilities
This exposes a simple way to grab the firmware provided collect_priveliged, ga, expanded, and lab capability bits. All of these bits come in from the same gpci request, so we've exposed all of them. Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7 code, the other bits are simply exposed in sysfs to inform the user. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-common.c | 39 +++ arch/powerpc/perf/hv-common.h | 17 + 2 files changed, 56 insertions(+) create mode 100644 arch/powerpc/perf/hv-common.c create mode 100644 arch/powerpc/perf/hv-common.h diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c new file mode 100644 index 000..47e02b3 --- /dev/null +++ b/arch/powerpc/perf/hv-common.c @@ -0,0 +1,39 @@ +#include asm/io.h +#include asm/hvcall.h + +#include hv-gpci.h +#include hv-common.h + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps) +{ + unsigned long r; + struct p { + struct hv_get_perf_counter_info_params params; + struct cv_system_performance_capabilities caps; + } __packed __aligned(sizeof(uint64_t)); + + struct p arg = { + .params = { + .counter_request = cpu_to_be32( + CIR_SYSTEM_PERFORMANCE_CAPABILITIES), + .starting_index = cpu_to_be32(-1), + .counter_info_version_in = 0, + } + }; + + r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg), sizeof(arg)); + + if (r) + return r; + + pr_devel(capability_mask: 0x%x\n, arg.caps.capability_mask); + + caps-version = arg.params.counter_info_version_out; + caps-collect_privileged = !!arg.caps.perf_collect_privileged; + caps-ga = !!(arg.caps.capability_mask CV_CM_GA); + caps-expanded = !!(arg.caps.capability_mask CV_CM_EXPANDED); + caps-lab = !!(arg.caps.capability_mask CV_CM_LAB); + + return r; +} diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h new file mode 100644 index 000..7e615bd --- /dev/null +++ b/arch/powerpc/perf/hv-common.h @@ -0,0 +1,17 @@ +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_ +#define LINUX_POWERPC_PERF_HV_COMMON_H_ + +#include linux/types.h + +struct hv_perf_caps { + u16 version; + u16 collect_privileged:1, + ga:1, + expanded:1, + lab:1, + unused:12; +}; + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps); + +#endif -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
This provides a basic link between perf and hv_gpci. Notably, it does not yet support transactions and does not list any events (they can still be manually composed). Example usage via perf tool: perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0x,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1 Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.c | 277 1 file changed, 277 insertions(+) create mode 100644 arch/powerpc/perf/hv-gpci.c diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c new file mode 100644 index 000..cc308fc --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.c @@ -0,0 +1,277 @@ +/* + * Hypervisor supplied gpci (get performance counter info) performance + * counter support + * + * Author: Cody P Schafer c...@linux.vnet.ibm.com + * Copyright 2014 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#define pr_fmt(fmt) hv-gpci: fmt + +#include linux/init.h +#include linux/perf_event.h +#include asm/firmware.h +#include asm/hvcall.h +#include asm/io.h + +#include hv-gpci.h +#include hv-common.h + +PMU_FORMAT_RANGE(request, config, 0, 31); /* u32 */ +PMU_FORMAT_RANGE(starting_index, config, 32, 63); /* u32 */ +PMU_FORMAT_RANGE(secondary_index, config1, 0, 15); /* u16 */ +PMU_FORMAT_RANGE(counter_info_version, config1, 16, 23); /* u8 */ +PMU_FORMAT_RANGE(length, config1, 24, 31); /* u8, bytes of data (1-8) */ +PMU_FORMAT_RANGE(offset, config1, 32, 63); /* u32, byte offset */ + +static struct attribute *format_attrs[] = { + format_attr_request.attr, + format_attr_starting_index.attr, + format_attr_secondary_index.attr, + format_attr_counter_info_version.attr, + + format_attr_offset.attr, + format_attr_length.attr, + NULL, +}; + +static struct attribute_group format_group = { + .name = format, + .attrs = format_attrs, +}; + +#define HV_CAPS_ATTR(_name, _format) \ +static ssize_t _name##_show(struct device *dev,\ + struct device_attribute *attr, \ + char *page) \ +{ \ + struct hv_perf_caps caps; \ + unsigned long hret = hv_perf_caps_get(caps); \ + if (hret) \ + return -EIO;\ + \ + return sprintf(page, _format, caps._name); \ +} \ +static struct device_attribute hv_caps_attr_##_name = __ATTR_RO(_name) + +static ssize_t kernel_version_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + return sprintf(page, 0x%x\n, COUNTER_INFO_VERSION_CURRENT); +} + +DEVICE_ATTR_RO(kernel_version); +HV_CAPS_ATTR(version, 0x%x\n); +HV_CAPS_ATTR(ga, %d\n); +HV_CAPS_ATTR(expanded, %d\n); +HV_CAPS_ATTR(lab, %d\n); +HV_CAPS_ATTR(collect_privileged, %d\n); + +static struct attribute *interface_attrs[] = { + dev_attr_kernel_version.attr, + hv_caps_attr_version.attr, + hv_caps_attr_ga.attr, + hv_caps_attr_expanded.attr, + hv_caps_attr_lab.attr, + hv_caps_attr_collect_privileged.attr, + NULL, +}; + +static struct attribute_group interface_group = { + .name = interface, + .attrs = interface_attrs, +}; + +static const struct attribute_group *attr_groups[] = { + format_group, + interface_group, + NULL, +}; + +#define GPCI_MAX_DATA_BYTES \ + (1024 - sizeof(struct hv_get_perf_counter_info_params)) + +static unsigned long single_gpci_request(u32 req, u32 starting_index, + u16 secondary_index, u8 version_in, u32 offset, u8 length, + u64 *value) +{ + unsigned long ret; + size_t i; + u64 count; + + struct { + struct hv_get_perf_counter_info_params params; + uint8_t bytes[GPCI_MAX_DATA_BYTES]; + } __packed __aligned(sizeof(uint64_t)) arg = { + .params = { + .counter_request = cpu_to_be32(req), + .starting_index = cpu_to_be32(starting_index), + .secondary_index = cpu_to_be16(secondary_index), + .counter_info_version_in = version_in, + } + }; + + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg
[PATCH v4 09/11] powerpc/perf: add support for the hv 24x7 interface
This provides a basic interface between hv_24x7 and perf. Similar to the one provided for gpci, it lacks transaction support and does not list any events. Example usage via perf tool: perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0x/' -r 0 -C 0 -x ' ' sleep 0.1 Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 493 1 file changed, 493 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7.c diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c new file mode 100644 index 000..81d68b6 --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.c @@ -0,0 +1,493 @@ +/* + * Hypervisor supplied 24x7 performance counter support + * + * Author: Cody P Schafer c...@linux.vnet.ibm.com + * Copyright 2014 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) hv-24x7: fmt + +#include linux/perf_event.h +#include linux/module.h +#include linux/slab.h +#include asm/firmware.h +#include asm/hvcall.h +#include asm/io.h + +#include hv-24x7.h +#include hv-24x7-catalog.h +#include hv-common.h + +/* + * TODO: Merging events: + * - Think of the hcall as an interface to a 4d array of counters: + * - x = domains + * - y = indexes in the domain (core, chip, vcpu, node, etc) + * - z = offset into the counter space + * - w = lpars (guest vms, logical partitions) + * - A single request is: x,y,y_last,z,z_last,w,w_last + * - this means we can retrieve a rectangle of counters in y,z for a single x. + * + * - Things to consider (ignoring w): + * - input cost_per_request = 16 + * - output cost_per_result(ys,zs) = 8 + 8 * ys + ys * zs + * - limited number of requests per hcall (must fit into 4K bytes) + * - 4k = 16 [buffer header] - 16 [request size] * request_count + * - 255 requests per hcall + * - sometimes it will be more efficient to read extra data and discard + */ + +PMU_FORMAT_RANGE(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */ +PMU_FORMAT_RANGE(starting_index, config, 16, 31); /* u16 */ +PMU_FORMAT_RANGE(offset, config, 32, 63); /* u32, see data_offset */ +PMU_FORMAT_RANGE(lpar, config1, 0, 15); /* u16 */ + +PMU_FORMAT_RANGE_RESERVED(reserved1, config, 4, 15); +PMU_FORMAT_RANGE_RESERVED(reserved2, config1, 16, 63); +PMU_FORMAT_RANGE_RESERVED(reserved3, config2, 0, 63); + +static struct attribute *format_attrs[] = { + format_attr_domain.attr, + format_attr_offset.attr, + format_attr_starting_index.attr, + format_attr_lpar.attr, + NULL, +}; + +static struct attribute_group format_group = { + .name = format, + .attrs = format_attrs, +}; + +/* + * read_offset_data - copy data from one buffer to another while treating the + *source buffer as a small view on the total avaliable + *source data. + * + * @dest: buffer to copy into + * @dest_len: length of @dest in bytes + * @requested_offset: the offset within the source data we want. Must be 0 + * @src: buffer to copy data from + * @src_len: length of @src in bytes + * @source_offset: the offset in the sorce data that (src,src_len) refers to. + * Must be 0 + * + * returns the number of bytes copied. + * + * The following ascii art shows the various buffer possitioning we need to + * handle, assigns some arbitrary varibles to points on the buffer, and then + * shows how we fiddle with those values to get things we care about (copy + * start in src and copy len) + * + * s = @src buffer + * d = @dest buffer + * '.' areas in d are written to. + * + * u + * x wv z + * d |.| + * s |--| + * + * u + * x w z v + * d |--| + * s |--| + * + * x wu,z,v + * d || + * s |--| + * + * x,wu,v,z + * d |..| + * s |--| + * + * xu + * wvz + * d || + * s |--| + * + * x z w v + * d|--| + * s |--| + * + * x = source_offset + * w = requested_offset + * z = source_offset + src_len + * v = requested_offset + dest_len + * + * w_offset_in_s = w - x = requested_offset - source_offset + * z_offset_in_s = z - x = src_len + * v_offset_in_s = v - x = request_offset + dest_len - src_len + */ +static ssize_t read_offset_data(void *dest, size_t dest_len, + loff_t requested_offset, void *src, + size_t src_len, loff_t source_offset) +{ + size_t w_offset_in_s = requested_offset
[PATCH v4 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/Makefile | 2 ++ arch/powerpc/platforms/pseries/Kconfig | 12 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index 60d71ee..f9c083a 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o + obj-$(CONFIG_PPC64)+= $(obj64-y) obj-$(CONFIG_PPC32)+= $(obj32-y) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 80b1d57..2cb8b77 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -111,6 +111,18 @@ config CMM will be reused for other LPARs. The interface allows firmware to balance memory across many LPARs. +config HV_PERF_CTRS + bool Hypervisor supplied PMU events (24x7 GPCI) + default y + depends on PERF_EVENTS PPC_PSERIES + help + Enable access to hypervisor supplied counters in perf. Currently, + this enables code that uses the hcall GetPerfCounterInfo and 24x7 + interfaces to retrieve counters. GPCI exists on Power 6 and later + systems. 24x7 is available on Power 8 systems. + + If unsure, select Y. + config DTL bool Dispatch Trace Log depends on PPC_SPLPAR DEBUG_FS -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 11/11] powerpc/perf/hv_{gpci, 24x7}: add documentation of device attributes
gpci and 24x7 expose some device specific attributes. Add some documentation for them. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- .../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 .../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++ 2 files changed, 66 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 new file mode 100644 index 000..e78ee79 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 @@ -0,0 +1,23 @@ +What: /sys/bus/event_source/devices/hv_24x7/interface/catalog +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + Provides access to the binary 24x7 catalog provided by the + hypervisor on POWER7 and 8 systems. This catalog lists events + avaliable from the powerpc hv_24x7 pmu. Its format is + documented here: + https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h + +What: /sys/bus/event_source/devices/hv_24x7/interface/catalog_length +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + A number equal to the length in bytes of the catalog. This is + also extractable from the provided binary catalog sysfs entry. + +What: /sys/bus/event_source/devices/hv_24x7/interface/catalog_version +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + Exposes the version field of the 24x7 catalog. This is also + extractable from the provided binary catalog sysfs entry. diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci new file mode 100644 index 000..3fa58c2 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci @@ -0,0 +1,43 @@ +What: /sys/bus/event_source/devices/hv_gpci/interface/collect_privileged +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + '0' if the hypervisor is configured to forbid access to event + counters being accumulated by other guests and to physical + domain event counters. + '1' if that access is allowed. + +What: /sys/bus/event_source/devices/hv_gpci/interface/ga +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + 0 or 1. Indicates whether we have access to GA events (listed + in arch/powerpc/perf/hv-gpci.h). + +What: /sys/bus/event_source/devices/hv_gpci/interface/expanded +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + 0 or 1. Indicates whether we have access to EXPANDED events (listed + in arch/powerpc/perf/hv-gpci.h). + +What: /sys/bus/event_source/devices/hv_gpci/interface/lab +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + 0 or 1. Indicates whether we have access to LAB events (listed + in arch/powerpc/perf/hv-gpci.h). + +What: /sys/bus/event_source/devices/hv_gpci/interface/version +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + A number indicating the version of the gpci interface that the + hypervisor reports supporting. + +What: /sys/bus/event_source/devices/hv_gpci/interface/kernel_version +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + A number indicating the latest version of the gpci interface + that the kernel is aware of. -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 02/11] perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
On 03/04/2014 12:09 AM, Cody P Schafer wrote: On 03/03/2014 09:19 PM, Michael Ellerman wrote: On Thu, 2014-27-02 at 21:04:55 UTC, Cody P Schafer wrote: Add PMU_FORMAT_RANGE() and PMU_FORMAT_RANGE_RESERVED() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..3da5081 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,21 @@ _name##_show(struct device *dev,\ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define PMU_FORMAT_RANGE(name, attr_var, bit_start, bit_end)\ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end);\ +PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) I really think these should have event in the name. Someone looking at the code is going to see event_get_foo() and wonder where that is defined. Grep won't find a definition, tags won't find a definition, the least you can do is have the macro name give some hint. That is a good point (grep-ability). Let me think about this. There is also the possibility that I could adjust the event_get_*() naming to something else. format_get_*()? event_get_format_*()? (these names keep growing...) I've gone with a format_get(name, event) style macro (making it more grep-able), in v4. Feel free to direct further discussion to the v4 posting. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] perf Documentation: sysfs events/ interfaces
Add documentation for the event, event.scale, and event.unit files in sysfs. event.scale and event.unit were undocumented. event was previously documented only for specific powerpc pmu events. I've added a restriction that event names cannot contain '.' characters so we can avoid breaking the API when we (inevitably) add more 'event.' files. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- .../testing/sysfs-bus-event_source-devices-events | 59 ++ 1 file changed, 59 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events index 3c1cc24..5393e1ed6 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events @@ -82,3 +82,62 @@ Description: POWER-systems specific performance monitoring events Further, multiple terms like 'event=0x' can be specified and separated with comma. All available terms are defined in the /sys/bus/event_source/devices/dev/format file. + +What: /sys/bus/event_source/devices/pmu/events/event +Date: 2014/02/24 +Contact: Linux kernel mailing list linux-ker...@vger.kernel.org +Description: Per-pmu performance monitoring events specific to the running system + + Each file (with a name not containing a '.') in the 'events' + directory describes a single performance monitoring event + supported by the pmu. The name of the file is the name of the event. + + File contents: + + term[=value][,term[=value]]... + + Where term is one of the terms listed under + /sys/bus/event_source/devices/pmu/format/ and value is + a number is base-16 format with a '0x' prefix (lowercase only). + If a term is specified alone (without an assigned value), it + is implied that 0x1 is assigned to that term. + + Examples (each of these lines would be in a seperate file): + + event=0x2abc + event=0x423,inv,cmask=0x3 + domain=0x1,offset=0x8,starting_index=0x + + Each of the assignments indicates a value to be assigned to a + particular set of bits (as defined by the format file + corresponding to the term) in the perf_event structure passed + to the perf_open syscall. + +What: /sys/bus/event_source/devices/pmu/events/event.unit +Date: 2014/02/24 +Contact: Linux kernel mailing list linux-ker...@vger.kernel.org +Description: Perf event units + + A string specifying the English plural numerical unit that event + (once multiplied by event.scale) represents. + + Example: + + Joules + +What: /sys/bus/event_source/devices/pmu/events/event.scale +Date: 2014/02/24 +Contact: Linux kernel mailing list linux-ker...@vger.kernel.org +Description: Perf event scaling factors + + A string representing a floating point value expressed in + scientific notation to be multiplied by the event count + recieved from the kernel to match the unit specified in the + event.unit file. + + Example: + + 2.3283064365386962890625e-10 + + This is provided to avoid performing floating point arithmetic + in the kernel. -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] perf Documentation: remove duplicated docs for powerpc cpu specific events
Listing specific events doesn't actually help us at all here because: - these events actually vary between different ppc processors, they aren't garunteed to be present. - the documentation of the file contents is now duplicated by the docs for arbitrary event file contents. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- .../testing/sysfs-bus-event_source-devices-events | 57 -- 1 file changed, 57 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events index 5393e1ed6..50c30a6 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events @@ -26,63 +26,6 @@ Description: Generic performance monitoring events raw code for the perf event identified by the file's basename. - -What: /sys/devices/cpu/events/PM_1PLUS_PPC_CMPL - /sys/devices/cpu/events/PM_BRU_FIN - /sys/devices/cpu/events/PM_BR_MPRED - /sys/devices/cpu/events/PM_CMPLU_STALL - /sys/devices/cpu/events/PM_CMPLU_STALL_BRU - /sys/devices/cpu/events/PM_CMPLU_STALL_DCACHE_MISS - /sys/devices/cpu/events/PM_CMPLU_STALL_DFU - /sys/devices/cpu/events/PM_CMPLU_STALL_DIV - /sys/devices/cpu/events/PM_CMPLU_STALL_ERAT_MISS - /sys/devices/cpu/events/PM_CMPLU_STALL_FXU - /sys/devices/cpu/events/PM_CMPLU_STALL_IFU - /sys/devices/cpu/events/PM_CMPLU_STALL_LSU - /sys/devices/cpu/events/PM_CMPLU_STALL_REJECT - /sys/devices/cpu/events/PM_CMPLU_STALL_SCALAR - /sys/devices/cpu/events/PM_CMPLU_STALL_SCALAR_LONG - /sys/devices/cpu/events/PM_CMPLU_STALL_STORE - /sys/devices/cpu/events/PM_CMPLU_STALL_THRD - /sys/devices/cpu/events/PM_CMPLU_STALL_VECTOR - /sys/devices/cpu/events/PM_CMPLU_STALL_VECTOR_LONG - /sys/devices/cpu/events/PM_CYC - /sys/devices/cpu/events/PM_GCT_NOSLOT_BR_MPRED - /sys/devices/cpu/events/PM_GCT_NOSLOT_BR_MPRED_IC_MISS - /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC - /sys/devices/cpu/events/PM_GCT_NOSLOT_IC_MISS - /sys/devices/cpu/events/PM_GRP_CMPL - /sys/devices/cpu/events/PM_INST_CMPL - /sys/devices/cpu/events/PM_LD_MISS_L1 - /sys/devices/cpu/events/PM_LD_REF_L1 - /sys/devices/cpu/events/PM_RUN_CYC - /sys/devices/cpu/events/PM_RUN_INST_CMPL - -Date: 2013/01/08 - -Contact: Linux kernel mailing list linux-ker...@vger.kernel.org - Linux Powerpc mailing list linuxppc-...@ozlabs.org - -Description: POWER-systems specific performance monitoring events - - A collection of performance monitoring events that may be - supported by the POWER CPU. These events can be monitored - using the 'perf(1)' tool. - - These events may not be supported by other CPUs. - - The contents of each file would look like: - - event=0x - - where 'N' is a hex digit and the number '0x' shows the - raw code for the perf event identified by the file's - basename. - - Further, multiple terms like 'event=0x' can be specified - and separated with comma. All available terms are defined in - the /sys/bus/event_source/devices/dev/format file. - What: /sys/bus/event_source/devices/pmu/events/event Date: 2014/02/24 Contact: Linux kernel mailing list linux-ker...@vger.kernel.org -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] perf: add documentation for sysfs interfaces
Documents pmu/event/event{,.scale,.units} and then removes the redundant POWER docs. Slightly restricts event names to avoid API funkyness when we add new event.? files ('.' forbidden in event names). The contact is currently lkml, it would be very useful to have a perf development list to put here instead (acme, feel like making one?). -- Cody P Schafer (2): perf Documentation: sysfs events/ interfaces perf Documentation: remove duplicated docs for powerpc cpu specific events .../testing/sysfs-bus-event_source-devices-events | 92 +++--- 1 file changed, 47 insertions(+), 45 deletions(-) -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 02/11] perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
On 03/03/2014 09:19 PM, Michael Ellerman wrote: On Thu, 2014-27-02 at 21:04:55 UTC, Cody P Schafer wrote: Add PMU_FORMAT_RANGE() and PMU_FORMAT_RANGE_RESERVED() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..3da5081 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,21 @@ _name##_show(struct device *dev, \ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define PMU_FORMAT_RANGE(name, attr_var, bit_start, bit_end) \ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end); \ +PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) I really think these should have event in the name. Someone looking at the code is going to see event_get_foo() and wonder where that is defined. Grep won't find a definition, tags won't find a definition, the least you can do is have the macro name give some hint. That is a good point (grep-ability). Let me think about this. There is also the possibility that I could adjust the event_get_*() naming to something else. format_get_*()? event_get_format_*()? (these names keep growing...) +#define PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) \ It doesn't generate a format attribute. This was done with the idea that the term format didn't just refer to the attribute exposed in sysfs, it referred to some subset of bits extractable from attr.config{,1,2}. Which is also the reasoning for the above naming. +static u64 event_get_##name##_max(void) \ +{ \ + int bits = (bit_end) - (bit_start) + 1; \ + return ((0x1ULL (bits - 1ULL)) - 1ULL) | \ + (0xFULL (bits - 4ULL));\ What's wrong with: (0x1ULL ((bit_end) - (bit_start) + 1)) - 1ULL; Overflowing the when bit_end = 63 and bit_start = 0 results in max(0, 63) = 0. That said, the current implementation is wrong when (bits 4). Here's one that actually works (without overflowing): return (((1ull (bit_end - bit_start)) - 1) 1) + 1; And an examination of the problematic case: #if 0 typedef unsigned long long ull; ull a = bits - 1; /* 63 */ ull b = 1 a; /* 0x8000 */ ull c = b - 1;/* 0x7fff */ ull d = b 1; /* 0xfffe */ ull e = d + 1;/* 0x */ return e; #endif Small number of valid inputs, so I also tested it for all of them using unsigned bits = (bit_end) - (bit_start) + 1; return (bits (sizeof(0ULL) * CHAR_BIT)) ? ((1ULL bits) - 1ULL) : ~0ULL; As the baseline correct one. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 03/11] perf: provide a common perf_event_nop_0() for use with .event_idx
On 03/03/2014 09:19 PM, Michael Ellerman wrote: On Thu, 2014-27-02 at 21:04:56 UTC, Cody P Schafer wrote: Rather an having every pmu that needs a function that just returns 0 for .event_idx define their own copy, reuse the one in kernel/events/core.c. Rename from perf_swevent_event_idx() because we're no longer using it for just software events. Naming is based on the perf_pmu_nop_*() functions. You could just use perf_pmu_nop_int() directly. No, .event_idx needs something that takes a (struct perf_event *), perf_pmu_nop_int() takes a (struct pmu *). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
On 02/26/2014 12:29 AM, Peter Zijlstra wrote: On Tue, Feb 25, 2014 at 01:38:31PM -0800, Cody P Schafer wrote: On 02/25/2014 02:20 AM, Peter Zijlstra wrote: On Tue, Feb 25, 2014 at 02:33:26PM +1100, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote: Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus. Peter, Ingo, can we get your ACK on this please? How are they used? I saw some usage in patch 9 or so; but its not explained anywhere. All patches have non-existent Changelogs and the few comments that are there are pretty hardware specific. So please do tell; what do you need this for? From this patch's change log: Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus. The key part here is architecture specific sw-like pmus, where the announcement explains why these pmus are sw-like: I don't read announcements for crucial patch details; announcements are lost and therefore unimportant. And I'll be sure to elaborate further in the changelog next time (if I don't drop this change entirely). This is the first comment I've got on this particular patch. The counters supplied by these interfaces are continually counting and never need to be (and cannot be) disabled or enabled. They additionally do not generate any interrupts. This makes them in some regards similar to software counters, and as a result their implimentation shares some common code (which an initial patch exposes) with the sw counters. Essentially, these pmus just provide access to a big array of counters which don't generate interrupts, and are all 64bit (and assumed to never overflow). Rather than duplicate the code that we already have for managing timing when reading from counters that don't have interrupts (the functions that are exposed by this patch), I've reused it. So note that all the software counters generate interrupts in their own measuring domain. The hrtimer ones measure time and generate time based interrupts, the event based ones generate 'interrupts' on their events. What you have here is a hw pmu without interrupt capability. That's fine, they don't get to generate interrupt. We have plenty of those already. But what you propose to do is add interrupt in another domain entirely. That's not fine. Don't do that. Ok, so it looks like I misunderstood the need for an interrupt. The intention in using the swevent_hrtimer code was to enable setting up the events as frequency sampled. After taking another look at the gpci and 24x7 pmus, I'm forbidding sampling events anyhow in event init, so the timer code isn't even taken advantage of. I'll drop this patch in the next set. You also try and conceal this information; so you suck. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 01/11] sysfs: create bin_attributes under the requested group
bin_attributes created/updated in create_files() (such as those listed via (struct device).attribute_groups) were not placed under the specified group, and instead appeared in the base kobj directory. Fix this by making bin_attributes use creating code similar to normal attributes. A quick grep shows that no one is using bin_attrs in a named attribute group yet, so we can do this without breaking anything in usespace. Note that I do not add is_visible() support to bin_attributes, though that could be done as well. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org --- No need to merge, already in driver-core-next as aabaf4c2050d21d39fe11eec889c508e84d6a328, included for reference/testing/verification only. --- fs/sysfs/group.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c index 6b57938..aa04068 100644 --- a/fs/sysfs/group.c +++ b/fs/sysfs/group.c @@ -70,8 +70,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj, if (grp-bin_attrs) { for (bin_attr = grp-bin_attrs; *bin_attr; bin_attr++) { if (update) - sysfs_remove_bin_file(kobj, *bin_attr); - error = sysfs_create_bin_file(kobj, *bin_attr); + kernfs_remove_by_name(parent, + (*bin_attr)-attr.name); + error = sysfs_add_file_mode_ns(parent, + (*bin_attr)-attr, true, + (*bin_attr)-attr.mode, NULL); if (error) break; } -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 00/11] powerpc: Add support for Power Hypervisor supplied performance counters
These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain performance counters: gpci (get performance counter info) and 24x7. The counters supplied by these interfaces are continually counting and never need to be (and cannot be) disabled or enabled. They additionally do not generate any interrupts. This makes them in some regards similar to software counters, and as a result their implimentation shares some common code (which an initial patch exposes) with the sw counters. These 2 PMUs end up providing access to some cpu, core, and chip level counters not exposed via other interfaces, and additionally allow monitoring the performance of other lpars (guests) on the same host system. Because it provides access to core and chip level counters, this pair of PMUs could be thought of as powerpc's counterpart to x86's uncore events. GPCI is an interface that already exists on some power6 and power7 machines (depending on the fw version), but is rather in-flexible and code intensive to add additional counters to. The 24x7 interfaces currently are designed to co-exist with the gpci interface while replacing most of gpci's functionality on newer systems. Right now, the 24x7 code I've submitted uses the gpci calls to check if it has permission to access certain classes of counters. -- Since v2: - sysfs: create bin_attributes under the requested group is now in git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git driver-core-next with commit-id: aabaf4c2050d21d39fe11eec889c508e84d6a328 - Split hv-24x7.h catalog definition into hv-24x7-catalog.h - Remove unused 24x7 and gpci interface structures and enums (Michael Ellerman) - Update docs to point to an external source for the full catalog docs - Extend some of the patch changelogs (Peter Z) - Remove hrtimer usage and just extern the event_idx helper (now renamed) (Peter Z) - s/PMU_RANGE_ATTR/PMU_FORMAT_RANGE/ (and similar RESERVED rename) (Michael Ellerman) - hv_24x7: small clarifications in read_offset_data()'s comment - hv_gpci: remove h_gpci_event_read() and h_gpci_event_del(), call _stop and _update() directly (Michael Ellerman) - Kconfig relocation, dependency changes, and rewording (Scott Wood and Michael Ellerman) Since v1: - add a few attributes to hv_gpci and hv_24x7 that expose some info about the interfaces - so the attributes show up in the right place, fix bin_attr creation in sysfs groups. - move hv_gpci.h and hv_24x7.h interface headers into arch/powerpc/perf - fix bit ordering in hv_gpci.h - split out hv_perf_caps_get() and use it to probe for the interface before registering - ensure proper alignment of hypervisor args - add a few missing counter requests to hv_gpci.h - s/CIR_xxx/CIR_XXX/ in hv_gpci.h - s/modules_init/device_initcall/ - Don't set event-cpu, use the user provided one - remove the union of gpci events, just give the user 1024 bytes to play with - clarify some comments (the list of fw versions is now labeled) - provide and event_24x7_request() that wraps single_24x7_request() - probably some other small fixes I'm forgetting. Cody P Schafer (11): sysfs: create bin_attributes under the requested group perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus perf: provide a common perf_event_nop_0() for use with .event_idx powerpc: add hvcalls for 24x7 and gpci (get performance counter info) powerpc/perf: add hv_gpci interface header powerpc/perf: add 24x7 interface headers powerpc/perf: add a shared interface to get gpci version and capabilities powerpc/perf: add support for the hv gpci (get performance counter info) interface powerpc/perf: add support for the hv 24x7 interface powerpc/perf: add kconfig option for hypervisor provided counters powerpc/perf/hv_{gpci,24x7}: add documentation of device attributes .../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 + .../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++ arch/powerpc/include/asm/hvcall.h | 5 + arch/powerpc/perf/Makefile | 2 + arch/powerpc/perf/hv-24x7-catalog.h| 33 ++ arch/powerpc/perf/hv-24x7.c| 492 + arch/powerpc/perf/hv-24x7.h| 109 + arch/powerpc/perf/hv-common.c | 39 ++ arch/powerpc/perf/hv-common.h | 17 + arch/powerpc/perf/hv-gpci.c| 277 arch/powerpc/perf/hv-gpci.h| 73 +++ arch/powerpc/platforms/pseries/Kconfig | 12 + fs/sysfs/group.c | 7 +- include/linux/perf_event.h | 18 + kernel/events/core.c | 10 +- 15 files changed, 1153 insertions(+), 7 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 create mode 100644 Documentation/ABI
[PATCH v3 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/include/asm/hvcall.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index d8b600b..5dbbb29 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -274,6 +274,11 @@ /* Platform specific hcalls, used by KVM */ #define H_RTAS 0xf000 +/* Platform specific hcalls, provided by PHYP */ +#define H_GET_24X7_CATALOG_PAGE0xF078 +#define H_GET_24X7_DATA0xF07C +#define H_GET_PERF_COUNTER_INFO0xF080 + #ifndef __ASSEMBLY__ /** -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 03/11] perf: provide a common perf_event_nop_0() for use with .event_idx
Rather an having every pmu that needs a function that just returns 0 for .event_idx define their own copy, reuse the one in kernel/events/core.c. Rename from perf_swevent_event_idx() because we're no longer using it for just software events. Naming is based on the perf_pmu_nop_*() functions. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 1 + kernel/events/core.c | 10 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 3da5081..24a7b45 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -560,6 +560,7 @@ extern void perf_pmu_migrate_context(struct pmu *pmu, extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); +extern int perf_event_nop_0(struct perf_event *event); struct perf_sample_data { u64 type; diff --git a/kernel/events/core.c b/kernel/events/core.c index 56003c6..2938a77 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event) return 0; } -static int perf_swevent_event_idx(struct perf_event *event) +int perf_event_nop_0(struct perf_event *event) { return 0; } @@ -5831,7 +5831,7 @@ static struct pmu perf_swevent = { .stop = perf_swevent_stop, .read = perf_swevent_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; #ifdef CONFIG_EVENT_TRACING @@ -5950,7 +5950,7 @@ static struct pmu perf_tracepoint = { .stop = perf_swevent_stop, .read = perf_swevent_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; static inline void perf_tp_register(void) @@ -6177,7 +6177,7 @@ static struct pmu perf_cpu_clock = { .stop = cpu_clock_event_stop, .read = cpu_clock_event_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; /* @@ -6257,7 +6257,7 @@ static struct pmu perf_task_clock = { .stop = task_clock_event_stop, .read = task_clock_event_read, - .event_idx = perf_swevent_event_idx, + .event_idx = perf_event_nop_0, }; static void perf_pmu_nop_void(struct pmu *pmu) -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 02/11] perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
Add PMU_FORMAT_RANGE() and PMU_FORMAT_RANGE_RESERVED() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..3da5081 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,21 @@ _name##_show(struct device *dev, \ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define PMU_FORMAT_RANGE(name, attr_var, bit_start, bit_end) \ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end); \ +PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) + +#define PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) \ +static u64 event_get_##name##_max(void) \ +{ \ + int bits = (bit_end) - (bit_start) + 1; \ + return ((0x1ULL (bits - 1ULL)) - 1ULL) | \ + (0xFULL (bits - 4ULL)); \ +} \ +static u64 event_get_##name(struct perf_event *event) \ +{ \ + return (event-attr.attr_var (bit_start)) \ + event_get_##name##_max(); \ +} + #endif /* _LINUX_PERF_EVENT_H */ -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 05/11] powerpc/perf: add hv_gpci interface header
H_GetPerformanceCounterInfo (refered to as hv_gpci or just gpci from here on) is an interface to retrieve specific performance counters and other data from the hypervisor. All outputs have a fixed format. This header only describes the portions of the interface that we plan on using in linux at this time. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.h | 73 + 1 file changed, 73 insertions(+) create mode 100644 arch/powerpc/perf/hv-gpci.h diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h new file mode 100644 index 000..b25f460 --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.h @@ -0,0 +1,73 @@ +#ifndef LINUX_POWERPC_PERF_HV_GPCI_H_ +#define LINUX_POWERPC_PERF_HV_GPCI_H_ + +#include linux/types.h + +/* From the document H_GetPerformanceCounterInfo Interface v1.07 */ + +/* H_GET_PERF_COUNTER_INFO argument */ +struct hv_get_perf_counter_info_params { + __be32 counter_request; /* I */ + __be32 starting_index; /* IO */ + __be16 secondary_index; /* IO */ + __be16 returned_values; /* O */ + __be32 detail_rc; /* O, only needed when called via *_norets() */ + + /* +* O, size each of counter_value element in bytes, only set for version +* = 0x3 +*/ + __be16 cv_element_size; + + /* I, 0 (zero) for versions 0x3 */ + __u8 counter_info_version_in; + + /* O, 0 (zero) if version 0x3. Must be set to 0 when making hcall */ + __u8 counter_info_version_out; + __u8 reserved[0xC]; + __u8 counter_value[]; +} __packed; + +/* + * counter info version = fw version/reference (spec version) + * + * 8 = power8 (1.07) + * [7 is skipped by spec 1.07] + * 6 = TLBIE (1.07) + * 5 = v7r7m0.phyp (1.05) + * [4 skipped] + * 3 = v7r6m0.phyp (?) + * [1,2 skipped] + * 0 = v7r{2,3,4}m0.phyp (?) + */ +#define COUNTER_INFO_VERSION_CURRENT 0x8 + +/* + * These determine the counter_value[] layout and the meaning of starting_index + * and secondary_index. + * + * Unless otherwise noted, @secondary_index is unused and ignored. + */ +enum counter_info_requests { + + /* GENERAL */ + + /* @starting_index: must be -1 (to refer to the current partition) +*/ + CIR_SYSTEM_PERFORMANCE_CAPABILITIES = 0X40, +}; + +struct cv_system_performance_capabilities { + /* If != 0, allowed to collect data from other partitions */ + __u8 perf_collect_privileged; + + /* These following are only valid if counter_info_version = 0x3 */ +#define CV_CM_GA (1 7) +#define CV_CM_EXPANDED (1 6) +#define CV_CM_LAB (1 5) + /* remaining bits are reserved */ + __u8 capability_mask; + __u8 reserved[0xE]; +} __packed; + +#endif -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 06/11] powerpc/perf: add 24x7 interface headers
24x7 (also called hv_24x7 or H_24X7) is an interface to obtain performance counters from the hypervisor. These counters do not have a fixed format/possition and are instead documented in a 24x7 Catalog, which is provided by the hypervisor (that interface is also documented paritialy in the included hv-24x7-catalog.h and fully in at https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ). The 24x7 data access is simply a copy operation into a 4 dimentional array of 64bit counters (from hypervisor to kernel memory). There is no interupt triggered on overflow, these are completely disjoint from the typical power pmu. This method of obtaining performance counters from the hypervisor is intended to paritialy replace the gpci interface. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7-catalog.h | 33 +++ arch/powerpc/perf/hv-24x7.h | 109 2 files changed, 142 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7-catalog.h create mode 100644 arch/powerpc/perf/hv-24x7.h diff --git a/arch/powerpc/perf/hv-24x7-catalog.h b/arch/powerpc/perf/hv-24x7-catalog.h new file mode 100644 index 000..21b19dd --- /dev/null +++ b/arch/powerpc/perf/hv-24x7-catalog.h @@ -0,0 +1,33 @@ +#ifndef LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_ +#define LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_ + +#include linux/types.h + +/* From document 24x7 Event and Group Catalog Formats Proposal v0.15 */ + +struct hv_24x7_catalog_page_0 { +#define HV_24X7_CATALOG_MAGIC 0x32347837 /* 24x7 in ASCII */ + __be32 magic; + __be32 length; /* In 4096 byte pages */ + __be64 version; /* XXX: arbitrary? what's the meaning/useage/purpose? */ + __u8 build_time_stamp[16]; /* MMDDHHMMSS\0\0 */ + __u8 reserved2[32]; + __be16 schema_data_offs; /* in 4096 byte pages */ + __be16 schema_data_len; /* in 4096 byte pages */ + __be16 schema_entry_count; + __u8 reserved3[2]; + __be16 event_data_offs; + __be16 event_data_len; + __be16 event_entry_count; + __u8 reserved4[2]; + __be16 group_data_offs; /* in 4096 byte pages */ + __be16 group_data_len; /* in 4096 byte pages */ + __be16 group_entry_count; + __u8 reserved5[2]; + __be16 formula_data_offs; /* in 4096 byte pages */ + __be16 formula_data_len; /* in 4096 byte pages */ + __be16 formula_entry_count; + __u8 reserved6[2]; +} __packed; + +#endif diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h new file mode 100644 index 000..720ebce --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.h @@ -0,0 +1,109 @@ +#ifndef LINUX_POWERPC_PERF_HV_24X7_H_ +#define LINUX_POWERPC_PERF_HV_24X7_H_ + +#include linux/types.h + +struct hv_24x7_request { + /* PHYSICAL domains require enabling via phyp/hmc. */ +#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01 +#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE 0x03 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP 0x04 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE 0x05 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06 + __u8 performance_domain; + __u8 reserved[0x1]; + + /* bytes to read starting at @data_offset. must be a multiple of 8 */ + __be16 data_size; + + /* +* byte offset within the perf domain to read from. must be 8 byte +* aligned +*/ + __be32 data_offset; + + /* +* only valid for VIRTUAL_PROCESSOR domains, ignored for others. +* -1 means current partition only +* Enabling via phyp/hmc required for non--1 values. 0 forbidden +* unless requestor is 0. +*/ + __be16 starting_lpar_ix; + + /* +* Ignored when @starting_lpar_ix == -1 +* Ignored when @performance_domain is not VIRTUAL_PROCESSOR_* +* -1 means infinite or all +*/ + __be16 max_num_lpars; + + /* chip, core, or virtual processor based on @performance_domain */ + __be16 starting_ix; + __be16 max_ix; +} __packed; + +struct hv_24x7_request_buffer { + /* 0 - ? */ + /* 1 - ? */ +#define HV_24X7_IF_VERSION_CURRENT 0x01 + __u8 interface_version; + __u8 num_requests; + __u8 reserved[0xE]; + struct hv_24x7_request requests[]; +} __packed; + +struct hv_24x7_result_element { + __be16 lpar_ix; + + /* +* represents the core, chip, or virtual processor based on the +* request's @performance_domain +*/ + __be16 domain_ix; + + /* -1 if @performance_domain does not refer to a virtual processor */ + __be32 lpar_cfg_instance_id; + + /* size = @result_element_data_size of cointaining result. */ + __u8 element_data[]; +} __packed; + +struct hv_24x7_result { + __u8 result_ix; + + /* +* 0 = not all
[PATCH v3 07/11] powerpc/perf: add a shared interface to get gpci version and capabilities
This exposes a simple way to grab the firmware provided collect_priveliged, ga, expanded, and lab capability bits. All of these bits come in from the same gpci request, so we've exposed all of them. Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7 code, the other bits are simply exposed in sysfs to inform the user. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-common.c | 39 +++ arch/powerpc/perf/hv-common.h | 17 + 2 files changed, 56 insertions(+) create mode 100644 arch/powerpc/perf/hv-common.c create mode 100644 arch/powerpc/perf/hv-common.h diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c new file mode 100644 index 000..47e02b3 --- /dev/null +++ b/arch/powerpc/perf/hv-common.c @@ -0,0 +1,39 @@ +#include asm/io.h +#include asm/hvcall.h + +#include hv-gpci.h +#include hv-common.h + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps) +{ + unsigned long r; + struct p { + struct hv_get_perf_counter_info_params params; + struct cv_system_performance_capabilities caps; + } __packed __aligned(sizeof(uint64_t)); + + struct p arg = { + .params = { + .counter_request = cpu_to_be32( + CIR_SYSTEM_PERFORMANCE_CAPABILITIES), + .starting_index = cpu_to_be32(-1), + .counter_info_version_in = 0, + } + }; + + r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg), sizeof(arg)); + + if (r) + return r; + + pr_devel(capability_mask: 0x%x\n, arg.caps.capability_mask); + + caps-version = arg.params.counter_info_version_out; + caps-collect_privileged = !!arg.caps.perf_collect_privileged; + caps-ga = !!(arg.caps.capability_mask CV_CM_GA); + caps-expanded = !!(arg.caps.capability_mask CV_CM_EXPANDED); + caps-lab = !!(arg.caps.capability_mask CV_CM_LAB); + + return r; +} diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h new file mode 100644 index 000..7e615bd --- /dev/null +++ b/arch/powerpc/perf/hv-common.h @@ -0,0 +1,17 @@ +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_ +#define LINUX_POWERPC_PERF_HV_COMMON_H_ + +#include linux/types.h + +struct hv_perf_caps { + u16 version; + u16 collect_privileged:1, + ga:1, + expanded:1, + lab:1, + unused:12; +}; + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps); + +#endif -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
This provides a basic link between perf and hv_gpci. Notably, it does not yet support transactions and does not list any events (they can still be manually composed). Example usage via perf tool: perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0x,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1 Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.c | 277 1 file changed, 277 insertions(+) create mode 100644 arch/powerpc/perf/hv-gpci.c diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c new file mode 100644 index 000..2f64732 --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.c @@ -0,0 +1,277 @@ +/* + * Hypervisor supplied gpci (get performance counter info) performance + * counter support + * + * Author: Cody P Schafer c...@linux.vnet.ibm.com + * Copyright 2014 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#define pr_fmt(fmt) hv-gpci: fmt + +#include linux/init.h +#include linux/perf_event.h +#include asm/firmware.h +#include asm/hvcall.h +#include asm/io.h + +#include hv-gpci.h +#include hv-common.h + +PMU_FORMAT_RANGE(request, config, 0, 31); /* u32 */ +PMU_FORMAT_RANGE(starting_index, config, 32, 63); /* u32 */ +PMU_FORMAT_RANGE(secondary_index, config1, 0, 15); /* u16 */ +PMU_FORMAT_RANGE(counter_info_version, config1, 16, 23); /* u8 */ +PMU_FORMAT_RANGE(length, config1, 24, 31); /* u8, bytes of data (1-8) */ +PMU_FORMAT_RANGE(offset, config1, 32, 63); /* u32, byte offset */ + +static struct attribute *format_attrs[] = { + format_attr_request.attr, + format_attr_starting_index.attr, + format_attr_secondary_index.attr, + format_attr_counter_info_version.attr, + + format_attr_offset.attr, + format_attr_length.attr, + NULL, +}; + +static struct attribute_group format_group = { + .name = format, + .attrs = format_attrs, +}; + +#define HV_CAPS_ATTR(_name, _format) \ +static ssize_t _name##_show(struct device *dev,\ + struct device_attribute *attr, \ + char *page) \ +{ \ + struct hv_perf_caps caps; \ + unsigned long hret = hv_perf_caps_get(caps); \ + if (hret) \ + return -EIO;\ + \ + return sprintf(page, _format, caps._name); \ +} \ +static struct device_attribute hv_caps_attr_##_name = __ATTR_RO(_name) + +static ssize_t kernel_version_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + return sprintf(page, 0x%x\n, COUNTER_INFO_VERSION_CURRENT); +} + +DEVICE_ATTR_RO(kernel_version); +HV_CAPS_ATTR(version, 0x%x\n); +HV_CAPS_ATTR(ga, %d\n); +HV_CAPS_ATTR(expanded, %d\n); +HV_CAPS_ATTR(lab, %d\n); +HV_CAPS_ATTR(collect_privileged, %d\n); + +static struct attribute *interface_attrs[] = { + dev_attr_kernel_version.attr, + hv_caps_attr_version.attr, + hv_caps_attr_ga.attr, + hv_caps_attr_expanded.attr, + hv_caps_attr_lab.attr, + hv_caps_attr_collect_privileged.attr, + NULL, +}; + +static struct attribute_group interface_group = { + .name = interface, + .attrs = interface_attrs, +}; + +static const struct attribute_group *attr_groups[] = { + format_group, + interface_group, + NULL, +}; + +#define GPCI_MAX_DATA_BYTES \ + (1024 - sizeof(struct hv_get_perf_counter_info_params)) + +static unsigned long single_gpci_request(u32 req, u32 starting_index, + u16 secondary_index, u8 version_in, u32 offset, u8 length, + u64 *value) +{ + unsigned long ret; + size_t i; + u64 count; + + struct { + struct hv_get_perf_counter_info_params params; + uint8_t bytes[GPCI_MAX_DATA_BYTES]; + } __packed __aligned(sizeof(uint64_t)) arg = { + .params = { + .counter_request = cpu_to_be32(req), + .starting_index = cpu_to_be32(starting_index), + .secondary_index = cpu_to_be16(secondary_index), + .counter_info_version_in = version_in, + } + }; + + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg
[PATCH v3 09/11] powerpc/perf: add support for the hv 24x7 interface
This provides a basic interface between hv_24x7 and perf. Similar to the one provided for gpci, it lacks transaction support and does not list any events. Example usage via perf tool: perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0x/' -r 0 -C 0 -x ' ' sleep 0.1 Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 492 1 file changed, 492 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7.c diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c new file mode 100644 index 000..c1847a3 --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.c @@ -0,0 +1,492 @@ +/* + * Hypervisor supplied 24x7 performance counter support + * + * Author: Cody P Schafer c...@linux.vnet.ibm.com + * Copyright 2014 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) hv-24x7: fmt + +#include linux/perf_event.h +#include linux/module.h +#include linux/slab.h +#include asm/firmware.h +#include asm/hvcall.h +#include asm/io.h + +#include hv-24x7.h +#include hv-24x7-catalog.h +#include hv-common.h + +/* + * TODO: Merging events: + * - Think of the hcall as an interface to a 4d array of counters: + * - x = domains + * - y = indexes in the domain (core, chip, vcpu, node, etc) + * - z = offset into the counter space + * - w = lpars (guest vms, logical partitions) + * - A single request is: x,y,y_last,z,z_last,w,w_last + * - this means we can retrieve a rectangle of counters in y,z for a single x. + * + * - Things to consider (ignoring w): + * - input cost_per_request = 16 + * - output cost_per_result(ys,zs) = 8 + 8 * ys + ys * zs + * - limited number of requests per hcall (must fit into 4K bytes) + * - 4k = 16 [buffer header] - 16 [request size] * request_count + * - 255 requests per hcall + * - sometimes it will be more efficient to read extra data and discard + */ + +PMU_FORMAT_RANGE(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */ +PMU_FORMAT_RANGE(starting_index, config, 16, 31); /* u16 */ +PMU_FORMAT_RANGE(offset, config, 32, 63); /* u32, see data_offset */ +PMU_FORMAT_RANGE(lpar, config1, 0, 15); /* u16 */ + +PMU_FORMAT_RANGE_RESERVED(reserved1, config, 4, 15); +PMU_FORMAT_RANGE_RESERVED(reserved2, config1, 16, 63); +PMU_FORMAT_RANGE_RESERVED(reserved3, config2, 0, 63); + +static struct attribute *format_attrs[] = { + format_attr_domain.attr, + format_attr_offset.attr, + format_attr_starting_index.attr, + format_attr_lpar.attr, + NULL, +}; + +static struct attribute_group format_group = { + .name = format, + .attrs = format_attrs, +}; + +/* + * read_offset_data - copy data from one buffer to another while treating the + *source buffer as a small view on the total avaliable + *source data. + * + * @dest: buffer to copy into + * @dest_len: length of @dest in bytes + * @requested_offset: the offset within the source data we want. Must be 0 + * @src: buffer to copy data from + * @src_len: length of @src in bytes + * @source_offset: the offset in the sorce data that (src,src_len) refers to. + * Must be 0 + * + * returns the number of bytes copied. + * + * The following ascii art shows the various buffer possitioning we need to + * handle, assigns some arbitrary varibles to points on the buffer, and then + * shows how we fiddle with those values to get things we care about (copy + * start in src and copy len) + * + * s = @src buffer + * d = @dest buffer + * '.' areas in d are written to. + * + * u + * x wv z + * d |.| + * s |--| + * + * u + * x w z v + * d |--| + * s |--| + * + * x wu,z,v + * d || + * s |--| + * + * x,wu,v,z + * d |..| + * s |--| + * + * xu + * wvz + * d || + * s |--| + * + * x z w v + * d|--| + * s |--| + * + * x = source_offset + * w = requested_offset + * z = source_offset + src_len + * v = requested_offset + dest_len + * + * w_offset_in_s = w - x = requested_offset - source_offset + * z_offset_in_s = z - x = src_len + * v_offset_in_s = v - x = request_offset + dest_len - src_len + */ +static ssize_t read_offset_data(void *dest, size_t dest_len, + loff_t requested_offset, void *src, + size_t src_len, loff_t source_offset) +{ + size_t w_offset_in_s = requested_offset
[PATCH v3 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/Makefile | 2 ++ arch/powerpc/platforms/pseries/Kconfig | 12 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index 60d71ee..f9c083a 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o + obj-$(CONFIG_PPC64)+= $(obj64-y) obj-$(CONFIG_PPC32)+= $(obj32-y) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 80b1d57..2cb8b77 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -111,6 +111,18 @@ config CMM will be reused for other LPARs. The interface allows firmware to balance memory across many LPARs. +config HV_PERF_CTRS + bool Hypervisor supplied PMU events (24x7 GPCI) + default y + depends on PERF_EVENTS PPC_PSERIES + help + Enable access to hypervisor supplied counters in perf. Currently, + this enables code that uses the hcall GetPerfCounterInfo and 24x7 + interfaces to retrieve counters. GPCI exists on Power 6 and later + systems. 24x7 is available on Power 8 systems. + + If unsure, select Y. + config DTL bool Dispatch Trace Log depends on PPC_SPLPAR DEBUG_FS -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 11/11] powerpc/perf/hv_{gpci, 24x7}: add documentation of device attributes
gpci and 24x7 expose some device specific attributes. Add some documentation for them. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- .../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 .../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++ 2 files changed, 66 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 new file mode 100644 index 000..e78ee79 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 @@ -0,0 +1,23 @@ +What: /sys/bus/event_source/devices/hv_24x7/interface/catalog +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + Provides access to the binary 24x7 catalog provided by the + hypervisor on POWER7 and 8 systems. This catalog lists events + avaliable from the powerpc hv_24x7 pmu. Its format is + documented here: + https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h + +What: /sys/bus/event_source/devices/hv_24x7/interface/catalog_length +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + A number equal to the length in bytes of the catalog. This is + also extractable from the provided binary catalog sysfs entry. + +What: /sys/bus/event_source/devices/hv_24x7/interface/catalog_version +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + Exposes the version field of the 24x7 catalog. This is also + extractable from the provided binary catalog sysfs entry. diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci new file mode 100644 index 000..3fa58c2 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci @@ -0,0 +1,43 @@ +What: /sys/bus/event_source/devices/hv_gpci/interface/collect_privileged +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + '0' if the hypervisor is configured to forbid access to event + counters being accumulated by other guests and to physical + domain event counters. + '1' if that access is allowed. + +What: /sys/bus/event_source/devices/hv_gpci/interface/ga +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + 0 or 1. Indicates whether we have access to GA events (listed + in arch/powerpc/perf/hv-gpci.h). + +What: /sys/bus/event_source/devices/hv_gpci/interface/expanded +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + 0 or 1. Indicates whether we have access to EXPANDED events (listed + in arch/powerpc/perf/hv-gpci.h). + +What: /sys/bus/event_source/devices/hv_gpci/interface/lab +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + 0 or 1. Indicates whether we have access to LAB events (listed + in arch/powerpc/perf/hv-gpci.h). + +What: /sys/bus/event_source/devices/hv_gpci/interface/version +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + A number indicating the version of the gpci interface that the + hypervisor reports supporting. + +What: /sys/bus/event_source/devices/hv_gpci/interface/kernel_version +Date: February 2014 +Contact: Cody P Schafer c...@linux.vnet.ibm.com +Description: + A number indicating the latest version of the gpci interface + that the kernel is aware of. -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote: Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..2702e91 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,21 @@ _name##_show(struct device *dev, \ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end) \ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end); \ +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end) + +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end) \ +static u64 event_get_##name##_max(void) \ +{ \ + int bits = (bit_end) - (bit_start) + 1; \ + return ((0x1ULL (bits - 1ULL)) - 1ULL) | \ + (0xFULL (bits - 4ULL));\ +} \ +static u64 event_get_##name(struct perf_event *event) \ +{ \ + return (event-attr.attr_var (bit_start)) \ + event_get_##name##_max(); \ +} I still don't like the names. EVENT_GETTER_AND_FORMAT() EVENT_RANGE() I'd prefer to describe the intended usage rather than what is generated both in case we change some of the specifics later, and to provide additional information to the developers beyond what a simple code reading gives. EVENT_RESERVED() Sure. The PMU_* naming was just based on the PMU_FORMAT_ATTR() naming, so I kept it for continuity with the existing API. Maybe EVENT_RANGE_RESERVED() would be more appropriate? ? It's not clear to me the max routine is useful in general. Can't we just do: +#define EVENT_RESERVED(name, attr_var, bit_start, bit_end) \ +static u64 event_get_##name(struct perf_event *event) \ +{ \ + return (event-attr.attr_var (bit_start)) \ + ((0x1ULL ((bit_end) - (bit_start) + 1)) - 1ULL); \ +} I use event_get_*_max() for some checking of parameters in event_init(). Having it lets me avoid specifying the maximum explicitly (0x7 = 0-19, for example). Specifying it explicitly would mean we'd have the bit width of the field in question encoded in two places instead of one, and I'd prefer to avoid unneeded duplication. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 05/11] powerpc: add hv_gpci interface header
On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:09 UTC, Cody P Schafer wrote: H_GetPerformanceCounterInfo (refered to as hv_gpci or just gpci from here on) is an interface to retrieve specific performance counters and other data from the hypervisor. All outputs have a fixed format (and are represented as structs in this patch). I still see unused stuff in here, can you strip it back to just what we need. Same goes for the next patch. Sure, I can remove the unused structures and enum entries (hadn't realized you wanted that in the last review). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 09/11] powerpc/perf: add support for the hv 24x7 interface
On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:13 UTC, Cody P Schafer wrote: This provides a basic interface between hv_24x7 and perf. Similar to the one provided for gpci, it lacks transaction support and does not list any events. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 491 1 file changed, 491 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7.c diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c new file mode 100644 index 000..13de140 --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.c ... + +/* + * read_offset_data - copy data from one buffer to another while treating the + *source buffer as a small view on the total avaliable + *source data. + * + * @dest: buffer to copy into + * @dest_len: length of @dest in bytes + * @requested_offset: the offset within the source data we want. Must be 0 + * @src: buffer to copy data from + * @src_len: length of @src in bytes + * @source_offset: the offset in the sorce data that (src,src_len) refers to. + * Must be 0 + * + * returns the number of bytes copied. + * + * '.' areas in d are written to. + * + * u + * x wv z + * d |.| + * s |--| + * + * u + * x w z v + * d |--| + * s |--| + * + * x wu,z,v + * d || + * s |--| + * + * x,wu,v,z + * d |--| + * s |--| + * + * xu + * wvz + * d || + * s |--| + * + * x z w v + * d|--| + * s |--| + * + * x = source_offset + * w = requested_offset + * z = source_offset + src_len + * v = requested_offset + dest_len + * + * w_offset_in_s = w - x = requested_offset - source_offset + * z_offset_in_s = z - x = src_len + * v_offset_in_s = v - x = request_offset + dest_len - src_len + * u_offset_in_s = min(z_offset_in_s, v_offset_in_s) + * + * copy_len = u_offset_in_s - w_offset_in_s = min(z_offset_in_s, v_offset_in_s) + * - w_offset_in_s Comments are great, especially for complicated code like this. But at a glance I don't actually understand what this comment is trying to tell me. The function was composed via some number line logic. The comment tries to explain what that logic is. The ascii art is various overlapping buffers that we're copying between (the '+'s from the patch are messing with the indenting some of the labels). The only major omission I'm seeing is I failed to note that d=dest and s=src (though this could be inferred from the comment about '.' indicating a write). Is there anything specific That doesn't make sense in the comment? (it may not be a comment that really can be read at a glance). + */ +static ssize_t read_offset_data(void *dest, size_t dest_len, + loff_t requested_offset, void *src, + size_t src_len, loff_t source_offset) +{ + size_t w_offset_in_s = requested_offset - source_offset; + size_t z_offset_in_s = src_len; + size_t v_offset_in_s = requested_offset + dest_len - src_len; + size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s); + size_t copy_len = u_offset_in_s - w_offset_in_s; + + if (requested_offset 0 || source_offset 0) + return -EINVAL; + + if (z_offset_in_s = w_offset_in_s) + return 0; + + memcpy(dest, src + w_offset_in_s, copy_len); + return copy_len; +} + +static unsigned long h_get_24x7_catalog_page(char page[static 4096], +u32 version, u32 index) +{ + WARN_ON(!IS_ALIGNED((unsigned long)page, 4096)); + return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE, + virt_to_phys(page), + version, + index); +} + +static ssize_t catalog_read(struct file *filp, struct kobject *kobj, + struct bin_attribute *bin_attr, char *buf, + loff_t offset, size_t count) +{ + unsigned long hret; + ssize_t ret = 0; + size_t catalog_len = 0, catalog_page_len = 0, page_count = 0; + loff_t page_offset = 0; + uint32_t catalog_version_num = 0; + void *page = kmalloc(4096, GFP_USER); + struct hv_24x7_catalog_page_0 *page_0 = page; + if (!page) + return -ENOMEM; + + + hret = h_get_24x7_catalog_page(page, 0, 0); + if (hret) { + ret = -EIO; + goto e_free; + } + + catalog_version_num = be32_to_cpu(page_0-version); + catalog_page_len = be32_to_cpu(page_0-length
Re: [PATCH v2 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:08 UTC, Cody P Schafer wrote: Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/include/asm/hvcall.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index d8b600b..652f7e4 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -274,6 +274,11 @@ /* Platform specific hcalls, used by KVM */ #define H_RTAS0xf000 +/* Platform specific hcalls, provided by PHYP */ +#define H_GET_24X7_CATALOG_PAGE 0xF078 +#define H_GET_24X7_DATA0xF07C +#define H_GET_PERF_COUNTER_INFO 0xF080 Some tabs some spaces, use tabs. Ack. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities
On 02/24/2014 07:33 PM, Michael Ellerman wrote: [PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities All the patches that touch perf should be powerpc/perf: foo Ok. On Fri, 2014-14-02 at 22:02:11 UTC, Cody P Schafer wrote: ... I realise this is a fairly small patch but a changelog is still nice. You could for example mention that we don't currently use .ga, .expanded or .lab but we're adding the logic anyway because ... Well, we do use them to expose some more information to the user (via sysfs attributes). Always nice to know what capabilities are enabled. But sure, I can explain why each bit in that structure is a good idea. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-common.c | 39 +++ arch/powerpc/perf/hv-common.h | 17 + 2 files changed, 56 insertions(+) create mode 100644 arch/powerpc/perf/hv-common.c create mode 100644 arch/powerpc/perf/hv-common.h diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c new file mode 100644 index 000..47e02b3 --- /dev/null +++ b/arch/powerpc/perf/hv-common.c @@ -0,0 +1,39 @@ +#include asm/io.h +#include asm/hvcall.h + +#include hv-gpci.h +#include hv-common.h + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps) +{ + unsigned long r; + struct p { + struct hv_get_perf_counter_info_params params; + struct cv_system_performance_capabilities caps; + } __packed __aligned(sizeof(uint64_t)); + + struct p arg = { + .params = { + .counter_request = cpu_to_be32( + CIR_SYSTEM_PERFORMANCE_CAPABILITIES), + .starting_index = cpu_to_be32(-1), + .counter_info_version_in = 0, + } + }; + + r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg), sizeof(arg)); + + if (r) + return r; + + pr_devel(capability_mask: 0x%x\n, arg.caps.capability_mask); + + caps-version = arg.params.counter_info_version_out; + caps-collect_privileged = !!arg.caps.perf_collect_privileged; + caps-ga = !!(arg.caps.capability_mask CV_CM_GA); + caps-expanded = !!(arg.caps.capability_mask CV_CM_EXPANDED); + caps-lab = !!(arg.caps.capability_mask CV_CM_LAB); + + return r; +} diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h new file mode 100644 index 000..7e615bd --- /dev/null +++ b/arch/powerpc/perf/hv-common.h @@ -0,0 +1,17 @@ +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_ +#define LINUX_POWERPC_PERF_HV_COMMON_H_ + +#include linux/types.h + +struct hv_perf_caps { + u16 version; + u16 collect_privileged:1, + ga:1, + expanded:1, + lab:1, + unused:12; +}; + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps); + +#endif -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:12 UTC, Cody P Schafer wrote: This provides a basic link between perf and hv_gpci. Notably, it does not yet support transactions and does not list any events (they can still be manually composed). Can you explain how the HV_CAPS stuff ends up looking. I'm not against adding it, but I'd like to understand how we expect it to be used a bit better. It's just a quick mechanism for me to expose some relevant information to userspace via sysfs using the hv_perf_caps_get() function's returned data. Documentation for this sysfs interface (and the rest) is in a later patch. I don't expect any more uses to show up unless the firmware decides to add another capability bit (in which case I'll want to expose it as well). diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c new file mode 100644 index 000..1f5d96d --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.c + +static struct pmu h_gpci_pmu = { + .task_ctx_nr = perf_invalid_context, + + .name = hv_gpci, + .attr_groups = attr_groups, + .event_init = h_gpci_event_init, + .add = h_gpci_event_add, + .del = h_gpci_event_del, = h_gpci_event_stop, + .start = h_gpci_event_start, + .stop= h_gpci_event_stop, + .read= h_gpci_event_read, = h_gpci_event_update + .event_idx = perf_swevent_event_idx, +}; whoops, thought I had fixed those 2 already. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:14 UTC, Cody P Schafer wrote: Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/Makefile | 2 ++ arch/powerpc/platforms/Kconfig.cputype | 6 ++ 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index 60d71ee..f9c083a 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o + obj-$(CONFIG_PPC64) += $(obj64-y) obj-$(CONFIG_PPC32) += $(obj32-y) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 434fda3..dcc67cd 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -364,6 +364,12 @@ config PPC_PERF_CTRS help This enables the powerpc-specific perf_event back-end. +config HV_PERF_CTRS + def_bool y This was bool, why did you change it? No, it wasn't. v1 also had def_bool. https://lkml.org/lkml/2014/1/16/518 Maybe you're confusing v2.1 and v2 of this patch? + depends on PERF_EVENTS PPC_HAVE_PMU_SUPPORT Should be: depends on PERF_EVENTS PPC_PSERIES + help + Enable access to perf counters provided by the hypervisor + Yep, the v2.1 patch (which I bungled and labeled as 9/11) already changes both of these. It'll end up rolled into v3. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
On 02/25/2014 02:20 AM, Peter Zijlstra wrote: On Tue, Feb 25, 2014 at 02:33:26PM +1100, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote: Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus. Peter, Ingo, can we get your ACK on this please? How are they used? I saw some usage in patch 9 or so; but its not explained anywhere. All patches have non-existent Changelogs and the few comments that are there are pretty hardware specific. So please do tell; what do you need this for? From this patch's change log: Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus. The key part here is architecture specific sw-like pmus, where the announcement explains why these pmus are sw-like: The counters supplied by these interfaces are continually counting and never need to be (and cannot be) disabled or enabled. They additionally do not generate any interrupts. This makes them in some regards similar to software counters, and as a result their implimentation shares some common code (which an initial patch exposes) with the sw counters. Essentially, these pmus just provide access to a big array of counters which don't generate interrupts, and are all 64bit (and assumed to never overflow). Rather than duplicate the code that we already have for managing timing when reading from counters that don't have interrupts (the functions that are exposed by this patch), I've reused it. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
On 02/25/2014 12:33 PM, Cody P Schafer wrote: On 02/24/2014 07:33 PM, Michael Ellerman wrote: On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote: Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..2702e91 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,21 @@ _name##_show(struct device *dev,\ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)\ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end);\ +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end) + +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)\ +static u64 event_get_##name##_max(void)\ +{\ +int bits = (bit_end) - (bit_start) + 1;\ +return ((0x1ULL (bits - 1ULL)) - 1ULL) |\ +(0xFULL (bits - 4ULL));\ +}\ +static u64 event_get_##name(struct perf_event *event)\ +{\ +return (event-attr.attr_var (bit_start)) \ +event_get_##name##_max();\ +} I still don't like the names. EVENT_GETTER_AND_FORMAT() EVENT_RANGE() I'd prefer to describe the intended usage rather than what is generated both in case we change some of the specifics later, and to provide additional information to the developers beyond what a simple code reading gives. EVENT_RESERVED() Sure. The PMU_* naming was just based on the PMU_FORMAT_ATTR() naming, so I kept it for continuity with the existing API. Maybe EVENT_RANGE_RESERVED() would be more appropriate? Thinking about this a bit more, EVENT_RANGE() and EVENT_RANGE_RESERVED() aren't quite ideal either. The EVENT name collides with the files we put in the event/ dir, which these macros generate files for the format/ dir. Maybe: FORMAT_RANGE() and FORMAT_RANGE_RESERVED() or PMU_FORMAT_RANGE(), PMU_FORMAT_RANGE_RESERVED() ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
On 02/24/2014 08:53 PM, Madhavan Srinivasan wrote: On Saturday 22 February 2014 05:44 AM, Cody P Schafer wrote: /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does nothing. Add a pr_warn() to convince any users that they should stop using it. The commit message from the removing commit notes that this functionality should move into the cpuidle driver, essentially by Would prefer to cleanup the code since the functionality is moved, instead of adding to it. We'd still want users of the interface to use an attribute wired up under the cpuidle/ dir, so a warning (to update their software) is still needed. As deepthi has noted, cpuidle right now doesn't support changing this on a per-cpu basis, so a cleanup isn't a simple matter. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
/sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does nothing. Add a pr_warn() to convince any users that they should stop using it. The commit message from the removing commit notes that this functionality should move into the cpuidle driver, essentially by adjusting target_residency to the specified value. At the moment, target_residency is not exposed by cpuidle's sysfs, so there isn't a drop in replacement for this. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/kernel/sysfs.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 97e1dc9..84097b4 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -50,6 +50,9 @@ static ssize_t store_smt_snooze_delay(struct device *dev, if (ret != 1) return -EINVAL; + pr_warn_ratelimited(%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n, + current-comm, task_pid_nr(current), cpu-dev.id); + per_cpu(smt_snooze_delay, cpu-dev.id) = snooze; return count; } @@ -60,6 +63,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev, { struct cpu *cpu = container_of(dev, struct cpu, dev); + pr_warn_ratelimited(%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n, + current-comm, task_pid_nr(current), cpu-dev.id); + return sprintf(buf, %ld\n, per_cpu(smt_snooze_delay, cpu-dev.id)); } -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2.1 9/11] powerpc/perf: add kconfig option for hypervisor provided counters
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/Makefile | 2 ++ arch/powerpc/platforms/pseries/Kconfig | 12 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index 60d71ee..f9c083a 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o + obj-$(CONFIG_PPC64)+= $(obj64-y) obj-$(CONFIG_PPC32)+= $(obj32-y) diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 80b1d57..2cb8b77 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -111,6 +111,18 @@ config CMM will be reused for other LPARs. The interface allows firmware to balance memory across many LPARs. +config HV_PERF_CTRS + bool Hypervisor supplied PMU events (24x7 GPCI) + default y + depends on PERF_EVENTS PPC_PSERIES + help + Enable access to hypervisor supplied counters in perf. Currently, + this enables code that uses the hcall GetPerfCounterInfo and 24x7 + interfaces to retrieve counters. GPCI exists on Power 6 and later + systems. 24x7 is available on Power 8 systems. + + If unsure, select Y. + config DTL bool Dispatch Trace Log depends on PPC_SPLPAR DEBUG_FS -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2.1 9/11] powerpc/perf: add kconfig option for hypervisor provided counters
Whoops, should be [Patch v2.1 10/11] ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
On 02/16/2014 11:11 PM, Michael Ellerman wrote: On Fri, 2014-02-14 at 16:25 -0800, Cody P Schafer wrote: On Fri, Feb 14, 2014 at 04:32:13PM -0600, Scott Wood wrote: On Fri, 2014-02-14 at 14:02 -0800, Cody P Schafer wrote: diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 434fda3..dcc67cd 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -364,6 +364,12 @@ config PPC_PERF_CTRS help This enables the powerpc-specific perf_event back-end. +config HV_PERF_CTRS + def_bool y + depends on PERF_EVENTS PPC_HAVE_PMU_SUPPORT + help + Enable access to perf counters provided by the hypervisor Please don't add default-y stuff that is platform-specific, and definitely point out that platform dependency in the config description -- I have to look elsewhere in the patchset to determine that this is for Power Hypervisor. PPC_HAVE_PMU_SUPPORT is enabled by all 6xx builds, even for hardware like e300 that doesn't have PMU at all (it has the FSL embedded perfmon instead), much less this hv interface. And yes, PPC_PERF_CTRS has the same problem and should be fixed. :-) Yep, I just based this one on what PPC_PERF_CTRS was doing. How about the following: +config HV_PERF_CTRS + bool Perf Hypervisor supplied counters Support for Hypervisor supplied PMU events (24x7 GPCI) ? Sounds good to me. + default y + depends on PERF_EVENTS PPC_HAVE_PMU_SUPPORT PPC_PSERIES I think you just want: depends on PERF_EVENTS PPC_PSERIES Because you're adding two completely new PMUs, they're not a struct power_pmu backend for the existing powerpc PMU implementation. Ack. I'll fix this up in v3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 00/11] powerpc: Add support for Power Hypervisor supplied performance counters
These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain performance counters: gpci (get performance counter info) and 24x7. The counters supplied by these interfaces are continually counting and never need to be (and cannot be) disabled or enabled. They additionally do not generate any interrupts. This makes them in some regards similar to software counters, and as a result their implimentation shares some common code (which an initial patch exposes) with the sw counters. There is ongoing work to support transactions for each of these pmus. These 2 PMUs end up providing access to some cpu, core, and chip level counters not exposed via other interfaces, and additionally allow monitoring the performance of other lpars (guests) on the same host system. Because it provides access to core and chip level counters, this pair of PMUs could be thought of as powerpc's counterpart to x86's uncore events. As an example, processor_bus_utilization_abc and processor_bus_utilization_wxyz (in hv_gpci.h) allow retreval of total cycles and idle cycles for various inter-chip buses. GPCI is an interface that already exists on some power6 and power7 machines (depending on the fw version), but is rather in-flexible and code intensive to add additional counters to. The 24x7 interfaces currently are designed to co-exist with the gpci interface while replacing most of gpci's functionality on newer systems. Right now, the 24x7 code I've submitted uses the gpci calls to check if it has permission to access certain classes of counters. Example perf usage: perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0x,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1 perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0x/' -r 0 -C 0 -x ' ' sleep 0.1 -- Changes since v1: - add a few attributes to hv_gpci and hv_24x7 that expose some info about the interfaces - so the attributes show up in the right place, fix bin_attr creation in sysfs groups. - move hv_gpci.h and hv_24x7.h interface headers into arch/powerpc/perf - fix bit ordering in hv_gpci.h - split out hv_perf_caps_get() and use it to probe for the interface before registering - ensure proper alignment of hypervisor args - add a few missing counter requests to hv_gpci.h - s/CIR_xxx/CIR_XXX/ in hv_gpci.h - s/modules_init/device_initcall/ - Don't set event-cpu, use the user provided one - remove the union of gpci events, just give the user 1024 bytes to play with - clarify some comments (the list of fw versions is now labeled) - provide and event_24x7_request() that wraps single_24x7_request() - probably some other small fixes I'm forgetting. Cody P Schafer (11): perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus perf core: export swevent hrtimer helpers sysfs: create bin_attributes under the requested group powerpc: add hvcalls for 24x7 and gpci (get performance counter info) powerpc: add hv_gpci interface header powerpc: add 24x7 interface header powerpc: add a shared interface to get gpci version and capabilities powerpc/perf: add support for the hv gpci (get performance counter info) interface powerpc/perf: add support for the hv 24x7 interface powerpc/perf: add kconfig option for hypervisor provided counters powerpc/perf/hv_{gpci,24x7}: add documentation of device attributes .../testing/sysfs-bus-event_source-devices-hv_24x7 | 22 + .../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++ arch/powerpc/include/asm/hvcall.h | 5 + arch/powerpc/perf/Makefile | 2 + arch/powerpc/perf/hv-24x7.c| 491 +++ arch/powerpc/perf/hv-24x7.h| 239 ++ arch/powerpc/perf/hv-common.c | 39 ++ arch/powerpc/perf/hv-common.h | 17 + arch/powerpc/perf/hv-gpci.c| 290 arch/powerpc/perf/hv-gpci.h| 521 + arch/powerpc/platforms/Kconfig.cputype | 6 + fs/sysfs/group.c | 7 +- include/linux/perf_event.h | 22 +- kernel/events/core.c | 8 +- 14 files changed, 1705 insertions(+), 7 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci create mode 100644 arch/powerpc/perf/hv-24x7.c create mode 100644 arch/powerpc/perf/hv-24x7.h create mode 100644 arch/powerpc/perf/hv-common.c create mode 100644 arch/powerpc/perf/hv-common.h create mode 100644 arch/powerpc/perf/hv-gpci.c create mode 100644 arch/powerpc/perf/hv-gpci.h -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 02/11] perf core: export swevent hrtimer helpers
Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 5 - kernel/events/core.c | 8 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2702e91..24378a9 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -559,7 +559,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu); extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); - +extern void perf_swevent_init_hrtimer(struct perf_event *event); +extern void perf_swevent_start_hrtimer(struct perf_event *event); +extern void perf_swevent_cancel_hrtimer(struct perf_event *event); +extern int perf_swevent_event_idx(struct perf_event *event); struct perf_sample_data { u64 type; diff --git a/kernel/events/core.c b/kernel/events/core.c index 56003c6..feb0347 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event) return 0; } -static int perf_swevent_event_idx(struct perf_event *event) +int perf_swevent_event_idx(struct perf_event *event) { return 0; } @@ -6045,7 +6045,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer) return ret; } -static void perf_swevent_start_hrtimer(struct perf_event *event) +void perf_swevent_start_hrtimer(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; s64 period; @@ -6067,7 +6067,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event) HRTIMER_MODE_REL_PINNED, 0); } -static void perf_swevent_cancel_hrtimer(struct perf_event *event) +void perf_swevent_cancel_hrtimer(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; @@ -6079,7 +6079,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event) } } -static void perf_swevent_init_hrtimer(struct perf_event *event) +void perf_swevent_init_hrtimer(struct perf_event *event) { struct hw_perf_event *hwc = event-hw; -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which generate functions to extract the relevent bits from event-attr.config{,1,2} for use by sw-like pmus where the 'config{,1,2}' values don't map directly to hardware registers. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- include/linux/perf_event.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e56b07f..2702e91 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -871,4 +871,21 @@ _name##_show(struct device *dev, \ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end) \ +PMU_FORMAT_ATTR(name, #attr_var : #bit_start - #bit_end); \ +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end) + +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end) \ +static u64 event_get_##name##_max(void) \ +{ \ + int bits = (bit_end) - (bit_start) + 1; \ + return ((0x1ULL (bits - 1ULL)) - 1ULL) | \ + (0xFULL (bits - 4ULL)); \ +} \ +static u64 event_get_##name(struct perf_event *event) \ +{ \ + return (event-attr.attr_var (bit_start)) \ + event_get_##name##_max(); \ +} + #endif /* _LINUX_PERF_EVENT_H */ -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 03/11] sysfs: create bin_attributes under the requested group
bin_attributes created/updated in create_files() (such as those listed via (struct device).attribute_groups) were not placed under the specified group, and instead appeared in the base kobj directory. Fix this by making bin_attributes use creating code similar to normal attributes. A quick grep shows that no one is using bin_attrs in a named attribute group yet, so we can do this without breaking anything in usespace. Note that I do not add is_visible() support to bin_attributes, though that could be done as well. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- fs/sysfs/group.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c index 6b57938..aa04068 100644 --- a/fs/sysfs/group.c +++ b/fs/sysfs/group.c @@ -70,8 +70,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj, if (grp-bin_attrs) { for (bin_attr = grp-bin_attrs; *bin_attr; bin_attr++) { if (update) - sysfs_remove_bin_file(kobj, *bin_attr); - error = sysfs_create_bin_file(kobj, *bin_attr); + kernfs_remove_by_name(parent, + (*bin_attr)-attr.name); + error = sysfs_add_file_mode_ns(parent, + (*bin_attr)-attr, true, + (*bin_attr)-attr.mode, NULL); if (error) break; } -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/include/asm/hvcall.h | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index d8b600b..652f7e4 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -274,6 +274,11 @@ /* Platform specific hcalls, used by KVM */ #define H_RTAS 0xf000 +/* Platform specific hcalls, provided by PHYP */ +#define H_GET_24X7_CATALOG_PAGE 0xF078 +#define H_GET_24X7_DATA0xF07C +#define H_GET_PERF_COUNTER_INFO 0xF080 + #ifndef __ASSEMBLY__ /** -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 05/11] powerpc: add hv_gpci interface header
H_GetPerformanceCounterInfo (refered to as hv_gpci or just gpci from here on) is an interface to retrieve specific performance counters and other data from the hypervisor. All outputs have a fixed format (and are represented as structs in this patch). Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.h | 521 1 file changed, 521 insertions(+) create mode 100644 arch/powerpc/perf/hv-gpci.h diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h new file mode 100644 index 000..d602809 --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.h @@ -0,0 +1,521 @@ +#ifndef LINUX_POWERPC_PERF_HV_GPCI_H_ +#define LINUX_POWERPC_PERF_HV_GPCI_H_ + +#include linux/types.h + +/* From the document H_GetPerformanceCounterInfo Interface v1.07 */ + +/* H_GET_PERF_COUNTER_INFO argument */ +struct hv_get_perf_counter_info_params { + __be32 counter_request; /* I */ + __be32 starting_index; /* IO */ + __be16 secondary_index; /* IO */ + __be16 returned_values; /* O */ + __be32 detail_rc; /* O, only needed when called via *_norets() */ + + /* +* O, size each of counter_value element in bytes, only set for version +* = 0x3 +*/ + __be16 cv_element_size; + + /* I, 0 (zero) for versions 0x3 */ + __u8 counter_info_version_in; + + /* O, 0 (zero) if version 0x3. Must be set to 0 when making hcall */ + __u8 counter_info_version_out; + __u8 reserved[0xC]; + __u8 counter_value[]; +} __packed; + +/* + * counter info version = fw version/reference (spec version) + * + * 8 = power8 (1.07) + * [7 is skipped by spec 1.07] + * 6 = TLBIE (1.07) + * 5 = v7r7m0.phyp (1.05) + * [4 skipped] + * 3 = v7r6m0.phyp (?) + * [1,2 skipped] + * 0 = v7r{2,3,4}m0.phyp (?) + */ +#define COUNTER_INFO_VERSION_CURRENT 0x8 + +/* + * These determine the counter_value[] layout and the meaning of starting_index + * and secondary_index. + * + * Unless otherwise noted, @secondary_index is unused and ignored. + */ +enum counter_info_requests { + + /* GENERAL */ + + /* @starting_index: starting physical processor index or -1 for +* current physical processor. Data is only collected +* for the processors' primary thread. +*/ + CIR_DISPATCH_TIMEBASE_BY_PROCESSOR = 0x10, + + /* @starting_index: starting partition id or -1 for the current logical +* partition (virtual machine). +*/ + CIR_ENTITLED_CAPPED_UNCAPPED_DONATED_IDLE_TIMEBASE_BY_PARTITION = 0x20, + + /* @starting_index: starting partition id or -1 for the current logical +* partition (virtual machine). +*/ + CIR_RUN_INSTRUCTIONS_RUN_CYCLES_BY_PARTITION = 0X30, + + /* @starting_index: must be -1 (to refer to the current partition) +*/ + CIR_SYSTEM_PERFORMANCE_CAPABILITIES = 0X40, + + + /* Data from this should only be considered valid if +* counter_info_version = 0x3 +* @starting_index: starting hardware chip id or -1 for the current hw +* chip id +*/ + CIR_PROCESSOR_BUS_UTILIZATION_ABC_LINKS = 0X50, + + /* Data from this should only be considered valid if +* counter_info_version = 0x3 +* @starting_index: starting hardware chip id or -1 for the current hw +* chip id +*/ + CIR_PROCESSOR_BUS_UTILIZATION_WXYZ_LINKS = 0X60, + + /* +* EXPANDED - the following are only avaliable if the CV_CM_EXPANDED +* bit is set from system_performace_capabilities. Enforcement is left +* to the hypervisor. +*/ + + /* Available if counter_info_version = 0x3 +* @starting_index: starting hardware chip id or -1 for the current hw +* chip id +*/ + CIR_PROCESSOR_BUS_UTILIZATION_GX_LINKS = 0X70, + + /* Available if counter_info_version = 0x3 +* @starting_index: starting hardware chip id or -1 for the current hw +* chip id +*/ + CIR_PROCESSOR_BUS_UTILIZATION_MC_LINKS = 0X80, + + /* Available if counter_info_version = 0x3 +* @starting_index: starting physical processor or -1 for the current +* physical processor +*/ + CIR_PROCESSOR_CONFIG = 0X90, + + /* Available if counter_info_version = 0x3 +* @starting_index: starting physical processor or -1 for the current +* physical processor +*/ + CIR_CURRENT_PROCESSOR_FREQUENCY = 0X91, + + /* Available if counter_info_version = 0x3 and = 0x7 +* @starting_index: starting physical processor or -1 for the current +* physical processor +*/ + CIR_PROCESSOR_CORE_UTILIZATION = 0X94, + + /* Available
[PATCH v2 06/11] powerpc: add 24x7 interface header
24x7 (also called hv_24x7 or H_24X7) is an interface to obtain performance counters from the hypervisor. These counters do not have a fixed format/possition and are instead documented in a 24x7 Catalog, which is provided by the hypervisor (that interface is also documented in this header). This method of obtaining performance counters from the hypervisor is intended to paritialy replace the gpci interface. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.h | 239 1 file changed, 239 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7.h diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h new file mode 100644 index 000..bf079da --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.h @@ -0,0 +1,239 @@ +#ifndef LINUX_POWERPC_PERF_HV_24X7_H_ +#define LINUX_POWERPC_PERF_HV_24X7_H_ + +#include linux/types.h + +struct hv_24x7_request { + /* PHYSICAL domains require enabling via phyp/hmc. */ +#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01 +#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE 0x03 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP 0x04 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE 0x05 +#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06 + __u8 performance_domain; + __u8 reserved[0x1]; + + /* bytes to read starting at @data_offset. must be a multiple of 8 */ + __be16 data_size; + + /* +* byte offset within the perf domain to read from. must be 8 byte +* aligned +*/ + __be32 data_offset; + + /* +* only valid for VIRTUAL_PROCESSOR domains, ignored for others. +* -1 means current partition only +* Enabling via phyp/hmc required for non--1 values. 0 forbidden +* unless requestor is 0. +*/ + __be16 starting_lpar_ix; + + /* +* Ignored when @starting_lpar_ix == -1 +* Ignored when @performance_domain is not VIRTUAL_PROCESSOR_* +* -1 means infinite or all +*/ + __be16 max_num_lpars; + + /* chip, core, or virtual processor based on @performance_domain */ + __be16 starting_ix; + __be16 max_ix; +} __packed; + +struct hv_24x7_request_buffer { + /* 0 - ? */ + /* 1 - ? */ +#define HV_24X7_IF_VERSION_CURRENT 0x01 + __u8 interface_version; + __u8 num_requests; + __u8 reserved[0xE]; + struct hv_24x7_request requests[]; +} __packed; + +struct hv_24x7_result_element { + __be16 lpar_ix; + + /* +* represents the core, chip, or virtual processor based on the +* request's @performance_domain +*/ + __be16 domain_ix; + + /* -1 if @performance_domain does not refer to a virtual processor */ + __be32 lpar_cfg_instance_id; + + /* size = @result_element_data_size of cointaining result. */ + __u8 element_data[]; +} __packed; + +struct hv_24x7_result { + __u8 result_ix; + + /* +* 0 = not all result elements fit into the buffer, additional requests +* required +* 1 = all result elements were returned +*/ + __u8 results_complete; + __be16 num_elements_returned; + + /* This is a copy of @data_size from the coresponding hv_24x7_request */ + __be16 result_element_data_size; + __u8 reserved[0x2]; + + /* WARNING: only valid for first result element due to variable sizes +* of result elements */ + /* struct hv_24x7_result_element[@num_elements_returned] */ + struct hv_24x7_result_element elements[]; +} __packed; + +struct hv_24x7_data_result_buffer { + /* See versioning for request buffer */ + __u8 interface_version; + + __u8 num_results; + __u8 reserved[0x1]; + __u8 failing_request_ix; + __be32 detailed_rc; + __be64 cec_cfg_instance_id; + __be64 catalog_version_num; + __u8 reserved2[0x8]; + /* WARNING: only valid for the first result due to variable sizes of +* results */ + struct hv_24x7_result results[]; /* [@num_results] */ +} __packed; + +/* From document 24x7 Event and Group Catalog Formats Proposal v0.14 */ +struct hv_24x7_catalog_page_0 { +#define HV_24X7_CATALOG_MAGIC 0x32347837 /* 24x7 in ASCII */ + __be32 magic; + __be32 length; /* In 4096 byte pages */ + __u8 reserved1[4]; + __be32 version; + __u8 build_time_stamp[16]; /* MMDDHHMMSS\0\0 */ + __u8 reserved2[32]; + __be16 schema_data_offs; /* in 4096 byte pages */ + __be16 schema_data_len; /* in 4096 byte pages */ + __be16 schema_entry_count; + __u8 reserved3[2]; + __be16 group_data_offs; /* in 4096 byte pages */ + __be16 group_data_len; /* in 4096 byte pages */ + __be16 group_entry_count; + __u8 reserved4[2]; + __be16
[PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities
Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-common.c | 39 +++ arch/powerpc/perf/hv-common.h | 17 + 2 files changed, 56 insertions(+) create mode 100644 arch/powerpc/perf/hv-common.c create mode 100644 arch/powerpc/perf/hv-common.h diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c new file mode 100644 index 000..47e02b3 --- /dev/null +++ b/arch/powerpc/perf/hv-common.c @@ -0,0 +1,39 @@ +#include asm/io.h +#include asm/hvcall.h + +#include hv-gpci.h +#include hv-common.h + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps) +{ + unsigned long r; + struct p { + struct hv_get_perf_counter_info_params params; + struct cv_system_performance_capabilities caps; + } __packed __aligned(sizeof(uint64_t)); + + struct p arg = { + .params = { + .counter_request = cpu_to_be32( + CIR_SYSTEM_PERFORMANCE_CAPABILITIES), + .starting_index = cpu_to_be32(-1), + .counter_info_version_in = 0, + } + }; + + r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg), sizeof(arg)); + + if (r) + return r; + + pr_devel(capability_mask: 0x%x\n, arg.caps.capability_mask); + + caps-version = arg.params.counter_info_version_out; + caps-collect_privileged = !!arg.caps.perf_collect_privileged; + caps-ga = !!(arg.caps.capability_mask CV_CM_GA); + caps-expanded = !!(arg.caps.capability_mask CV_CM_EXPANDED); + caps-lab = !!(arg.caps.capability_mask CV_CM_LAB); + + return r; +} diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h new file mode 100644 index 000..7e615bd --- /dev/null +++ b/arch/powerpc/perf/hv-common.h @@ -0,0 +1,17 @@ +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_ +#define LINUX_POWERPC_PERF_HV_COMMON_H_ + +#include linux/types.h + +struct hv_perf_caps { + u16 version; + u16 collect_privileged:1, + ga:1, + expanded:1, + lab:1, + unused:12; +}; + +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps); + +#endif -- 1.8.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
This provides a basic link between perf and hv_gpci. Notably, it does not yet support transactions and does not list any events (they can still be manually composed). Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-gpci.c | 290 1 file changed, 290 insertions(+) create mode 100644 arch/powerpc/perf/hv-gpci.c diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c new file mode 100644 index 000..1f5d96d --- /dev/null +++ b/arch/powerpc/perf/hv-gpci.c @@ -0,0 +1,290 @@ +/* + * Hypervisor supplied gpci (get performance counter info) performance + * counter support + * + * Author: Cody P Schafer c...@linux.vnet.ibm.com + * Copyright 2014 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#define pr_fmt(fmt) hv-gpci: fmt + +#include linux/init.h +#include linux/perf_event.h +#include asm/firmware.h +#include asm/hvcall.h +#include asm/io.h + +#include hv-gpci.h +#include hv-common.h + +PMU_RANGE_ATTR(request, config, 0, 31); /* u32 */ +PMU_RANGE_ATTR(starting_index, config, 32, 63); /* u32 */ +PMU_RANGE_ATTR(secondary_index, config1, 0, 15); /* u16 */ +PMU_RANGE_ATTR(counter_info_version, config1, 16, 23); /* u8 */ +PMU_RANGE_ATTR(length, config1, 24, 31); /* u8, bytes of data (1-8) */ +PMU_RANGE_ATTR(offset, config1, 32, 63); /* u32, byte offset */ + +static struct attribute *format_attrs[] = { + format_attr_request.attr, + format_attr_starting_index.attr, + format_attr_secondary_index.attr, + format_attr_counter_info_version.attr, + + format_attr_offset.attr, + format_attr_length.attr, + NULL, +}; + +static struct attribute_group format_group = { + .name = format, + .attrs = format_attrs, +}; + +#define HV_CAPS_ATTR(_name, _format) \ +static ssize_t _name##_show(struct device *dev,\ + struct device_attribute *attr, \ + char *page) \ +{ \ + struct hv_perf_caps caps; \ + unsigned long hret = hv_perf_caps_get(caps); \ + if (hret) \ + return -EIO;\ + \ + return sprintf(page, _format, caps._name); \ +} \ +static struct device_attribute hv_caps_attr_##_name = __ATTR_RO(_name) + +static ssize_t kernel_version_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + return sprintf(page, 0x%x\n, COUNTER_INFO_VERSION_CURRENT); +} + +DEVICE_ATTR_RO(kernel_version); +HV_CAPS_ATTR(version, 0x%x\n); +HV_CAPS_ATTR(ga, %d\n); +HV_CAPS_ATTR(expanded, %d\n); +HV_CAPS_ATTR(lab, %d\n); +HV_CAPS_ATTR(collect_privileged, %d\n); + +static struct attribute *interface_attrs[] = { + dev_attr_kernel_version.attr, + hv_caps_attr_version.attr, + hv_caps_attr_ga.attr, + hv_caps_attr_expanded.attr, + hv_caps_attr_lab.attr, + hv_caps_attr_collect_privileged.attr, + NULL, +}; + +static struct attribute_group interface_group = { + .name = interface, + .attrs = interface_attrs, +}; + +static const struct attribute_group *attr_groups[] = { + format_group, + interface_group, + NULL, +}; + +#define GPCI_MAX_DATA_BYTES \ + (1024 - sizeof(struct hv_get_perf_counter_info_params)) + +static unsigned long single_gpci_request(u32 req, u32 starting_index, + u16 secondary_index, u8 version_in, u32 offset, u8 length, + u64 *value) +{ + unsigned long ret; + size_t i; + u64 count; + + struct { + struct hv_get_perf_counter_info_params params; + uint8_t bytes[GPCI_MAX_DATA_BYTES]; + } __packed __aligned(sizeof(uint64_t)) arg = { + .params = { + .counter_request = cpu_to_be32(req), + .starting_index = cpu_to_be32(starting_index), + .secondary_index = cpu_to_be16(secondary_index), + .counter_info_version_in = version_in, + } + }; + + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, + virt_to_phys(arg), sizeof(arg)); + if (ret) { + pr_devel(hcall failed: 0x%lx\n, ret); + return ret; + } + + /* +* we verify offset and length are within the zeroed buffer
[PATCH v2 09/11] powerpc/perf: add support for the hv 24x7 interface
This provides a basic interface between hv_24x7 and perf. Similar to the one provided for gpci, it lacks transaction support and does not list any events. Signed-off-by: Cody P Schafer c...@linux.vnet.ibm.com --- arch/powerpc/perf/hv-24x7.c | 491 1 file changed, 491 insertions(+) create mode 100644 arch/powerpc/perf/hv-24x7.c diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c new file mode 100644 index 000..13de140 --- /dev/null +++ b/arch/powerpc/perf/hv-24x7.c @@ -0,0 +1,491 @@ +/* + * Hypervisor supplied 24x7 performance counter support + * + * Author: Cody P Schafer c...@linux.vnet.ibm.com + * Copyright 2014 IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) hv-24x7: fmt + +#include linux/perf_event.h +#include linux/module.h +#include linux/slab.h +#include asm/firmware.h +#include asm/hvcall.h +#include asm/io.h + +#include hv-24x7.h +#include hv-common.h + +/* + * TODO: Merging events: + * - Think of the hcall as an interface to a 4d array of counters: + * - x = domains + * - y = indexes in the domain (core, chip, vcpu, node, etc) + * - z = offset into the counter space + * - w = lpars (guest vms, logical partitions) + * - A single request is: x,y,y_last,z,z_last,w,w_last + * - this means we can retrieve a rectangle of counters in y,z for a single x. + * + * - Things to consider (ignoring w): + * - input cost_per_request = 16 + * - output cost_per_result(ys,zs) = 8 + 8 * ys + ys * zs + * - limited number of requests per hcall (must fit into 4K bytes) + * - 4k = 16 [buffer header] - 16 [request size] * request_count + * - 255 requests per hcall + * - sometimes it will be more efficient to read extra data and discard + */ + +PMU_RANGE_ATTR(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */ +PMU_RANGE_ATTR(starting_index, config, 16, 31); /* u16 */ +PMU_RANGE_ATTR(offset, config, 32, 63); /* u32, see data_offset */ +PMU_RANGE_ATTR(lpar, config1, 0, 15); /* u16 */ + +PMU_RANGE_RESV(reserved1, config, 4, 15); +PMU_RANGE_RESV(reserved2, config1, 16, 63); +PMU_RANGE_RESV(reserved3, config2, 0, 63); + +static struct attribute *format_attrs[] = { + format_attr_domain.attr, + format_attr_offset.attr, + format_attr_starting_index.attr, + format_attr_lpar.attr, + NULL, +}; + +static struct attribute_group format_group = { + .name = format, + .attrs = format_attrs, +}; + +/* + * read_offset_data - copy data from one buffer to another while treating the + *source buffer as a small view on the total avaliable + *source data. + * + * @dest: buffer to copy into + * @dest_len: length of @dest in bytes + * @requested_offset: the offset within the source data we want. Must be 0 + * @src: buffer to copy data from + * @src_len: length of @src in bytes + * @source_offset: the offset in the sorce data that (src,src_len) refers to. + * Must be 0 + * + * returns the number of bytes copied. + * + * '.' areas in d are written to. + * + * u + * x wv z + * d |.| + * s |--| + * + * u + * x w z v + * d |--| + * s |--| + * + * x wu,z,v + * d || + * s |--| + * + * x,wu,v,z + * d |--| + * s |--| + * + * xu + * wvz + * d || + * s |--| + * + * x z w v + * d|--| + * s |--| + * + * x = source_offset + * w = requested_offset + * z = source_offset + src_len + * v = requested_offset + dest_len + * + * w_offset_in_s = w - x = requested_offset - source_offset + * z_offset_in_s = z - x = src_len + * v_offset_in_s = v - x = request_offset + dest_len - src_len + * u_offset_in_s = min(z_offset_in_s, v_offset_in_s) + * + * copy_len = u_offset_in_s - w_offset_in_s = min(z_offset_in_s, v_offset_in_s) + * - w_offset_in_s + */ +static ssize_t read_offset_data(void *dest, size_t dest_len, + loff_t requested_offset, void *src, + size_t src_len, loff_t source_offset) +{ + size_t w_offset_in_s = requested_offset - source_offset; + size_t z_offset_in_s = src_len; + size_t v_offset_in_s = requested_offset + dest_len - src_len; + size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s); + size_t copy_len = u_offset_in_s - w_offset_in_s; + + if (requested_offset 0 || source_offset 0) + return