[PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values
From a1aa992fb25fb8e98a5c5724376ae8cc91463de3 Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Mon, 25 Jan 2016 23:05:36 -0500 Subject: [PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values For 24x7 counters, perf displays the raw value of the 24x7 counter, which is a monotonically increasing value. perf stat -C 0 -e \ 'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \ sleep 1 Performance counter stats for 'CPU(s) 0': 9,105,403,170 hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/ 0.000425751 seconds time elapsed In the typical usage of 'perf stat' this counter value is not as useful as the _change_ in the counter value over the duration of the application. Have h_24x7_event_init() set the event's prev_count to the raw value of the 24x7 counter at the time of initialization. When the application terminates, hv_24x7_event_read() will compute the change in value and report to the perf tool. Similarly, for the transaction interface, clear the event count to 0 at the beginning of the transaction. perf stat -C 0 -e \ 'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \ sleep 1 Performance counter stats for 'CPU(s) 0': 245,758 hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/ 1.006366383 seconds time elapsed Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/perf/hv-24x7.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index b7a9a03..77b958f 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -1222,11 +1222,12 @@ static int h_24x7_event_init(struct perf_event *event) return -EACCES; } - /* see if the event complains */ + /* Get the initial value of the counter for this event */ if (single_24x7_request(event, &ct)) { pr_devel("test hcall failed\n"); return -EIO; } + (void)local64_xchg(&event->hw.prev_count, ct); return 0; } @@ -1289,6 +1290,16 @@ static void h_24x7_event_read(struct perf_event *event) h24x7hw = &get_cpu_var(hv_24x7_hw); h24x7hw->events[i] = event; put_cpu_var(h24x7hw); + /* +* Clear the event count so we can compute the _change_ +* in the 24x7 raw counter value at the end of the txn. +* +* Note that we could alternatively read the 24x7 value +* now and save its value in event->hw.prev_count. But +* that would require issuing a hcall, which would then +* defeat the purpose of using the txn interface. +*/ + local64_set(&event->count, 0); } put_cpu_var(hv_24x7_reqb); -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] powerpc/perf/hv-24x7: Fix usage with chip events
From 9b5848ce1834a4d82fc251022035d36d9e26b500 Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Sat, 23 Jan 2016 03:58:12 -0500 Subject: [PATCH 1/2] powerpc/perf/hv-24x7: Fix usage with chip events. 24x7 counters can belong to different domains (core, chip, virtual CPU etc). For events in the 'chip' domain, sysfs entry currently looks like: $ cd /sys/bus/event_source/devices/hv_24x7/events $ cat PM_XLINK_CYCLES__PHYS_CHIP domain=0x1,offset=0x230,core=?,lpar=0x0 where the required parameter, 'core=?' is specified with perf as: perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,core=1/ \ /bin/true This is inconsistent in that 'core' is a required parameter for a chip event. Instead, have the the sysfs entry display 'chip=?' for chip events: $ cd /sys/bus/event_source/devices/hv_24x7/events $ cat PM_XLINK_CYCLES__PHYS_CHIP domain=0x1,offset=0x230,chip=?,lpar=0x0 We also need to add a 'chip' entry in the sysfs format directory: $ ls /sys/bus/event_source/devices/hv_24x7/format chip core domain lpar offset vcpu (new) so the perf tool can automatically check usage and format the chip parameter correctly: $ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/ \ /bin/true Required parameter 'chip' not specified invalid or unsupported event: 'hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/' $ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/ \ /bin/true hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/: 0 6628908 6628908 Performance counter stats for 'CPU(s) 0': 0 hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/ 0.006606970 seconds time elapsed Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/perf/hv-24x7.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 9f9dfda..b7a9a03 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -101,6 +101,7 @@ static bool catalog_entry_domain_is_valid(unsigned domain) EVENT_DEFINE_RANGE_FORMAT(domain, config, 0, 3); /* u16 */ EVENT_DEFINE_RANGE_FORMAT(core, config, 16, 31); +EVENT_DEFINE_RANGE_FORMAT(chip, config, 16, 31); EVENT_DEFINE_RANGE_FORMAT(vcpu, config, 16, 31); /* u32, see "data_offset" */ EVENT_DEFINE_RANGE_FORMAT(offset, config, 32, 63); @@ -115,6 +116,7 @@ static struct attribute *format_attrs[] = { &format_attr_domain.attr, &format_attr_offset.attr, &format_attr_core.attr, + &format_attr_chip.attr, &format_attr_vcpu.attr, &format_attr_lpar.attr, NULL, @@ -289,10 +291,16 @@ static char *event_fmt(struct hv_24x7_event_data *event, unsigned domain) const char *sindex; const char *lpar; - if (is_physical_domain(domain)) { + switch (domain) { + case HV_PERF_DOMAIN_PHYS_CHIP: + lpar = "0x0"; + sindex = "chip"; + break; + case HV_PERF_DOMAIN_PHYS_CORE: lpar = "0x0"; sindex = "core"; - } else { + break; + default: lpar = "?"; sindex = "vcpu"; } @@ -1089,10 +1097,16 @@ static int add_event_to_24x7_request(struct perf_event *event, return -EINVAL; } - if (is_physical_domain(event_get_domain(event))) + switch (event_get_domain(event)) { + case HV_PERF_DOMAIN_PHYS_CHIP: + idx = event_get_chip(event); + break; + case HV_PERF_DOMAIN_PHYS_CORE: idx = event_get_core(event); - else + break; + default: idx = event_get_vcpu(event); + } i = request_buffer->num_requests++; req = &request_buffer->requests[i]; -- 2.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 15/16] perf kvm/powerpc: Add support for HCALL reasons
From: Hemant Kumar Powerpc provides hcall events that also provides insights into guest behaviour. Enhance perf kvm stat to record and analyze hcall events. - To trace hcall events : perf kvm stat record - To show the results : perf kvm stat report --event=hcall The result shows the number of hypervisor calls from the guest grouped by their respective reasons displayed with the frequency. This patch makes use of two additional tracepoints "kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall codes to their respective names, it needs a mapping. Such mapping is added in this patch in book3s_hcalls.h. # pgrep qemu A sample output : 19378 60515 2 VMs running. # perf kvm stat record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ] # perf kvm stat report -p 60515 --event=hcall Analyze events for all VMs, all VCPUs: HCALL-EVENT Samples Samples% Time% MinTime MaxTime AvgTime H_IPI 822 66.08% 88.10% 0.63us 11.38us 2.05us (+- 1.42%) H_SEND_CRQ 144 11.58% 3.77% 0.41us 0.88us 0.50us (+- 1.47%) H_VIO_SIGNAL 118 9.49% 2.86% 0.37us 0.83us 0.47us (+- 1.43%) H_PUT_TERM_CHAR 76 6.11% 2.07% 0.37us 0.90us 0.52us (+- 2.43%) H_GET_TERM_CHAR 74 5.95% 2.23% 0.37us 1.70us 0.58us (+- 4.77%) H_RTAS 6 0.48% 0.85% 1.10us 9.25us 2.70us (+-48.57%) H_PERFMON 4 0.32% 0.12% 0.41us 0.96us 0.59us (+-20.92%) Total Samples:1244, Total events handled time:1916.69us. Signed-off-by: Hemant Kumar Cc: Alexander Yarygin Cc: David Ahern Cc: Michael Ellerman Cc: Naveen N. Rao Cc: Paul Mackerras Cc: Scott Wood Cc: Srikar Dronamraju Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1453962787-15376-4-git-send-email-hem...@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 +++ tools/perf/arch/powerpc/util/kvm-stat.c | 65 +- 2 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h b/tools/perf/arch/powerpc/util/book3s_hcalls.h new file mode 100644 index ..0dd6b7f2d44f --- /dev/null +++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h @@ -0,0 +1,123 @@ +#ifndef ARCH_PERF_BOOK3S_HV_HCALLS_H +#define ARCH_PERF_BOOK3S_HV_HCALLS_H + +/* + * PowerPC HCALL codes : hcall code to name mapping + */ +#define kvm_trace_symbol_hcall \ + {0x4, "H_REMOVE"}, \ + {0x8, "H_ENTER"}, \ + {0xc, "H_READ"},\ + {0x10, "H_CLEAR_MOD"}, \ + {0x14, "H_CLEAR_REF"}, \ + {0x18, "H_PROTECT"},\ + {0x1c, "H_GET_TCE"},\ + {0x20, "H_PUT_TCE"},\ + {0x24, "H_SET_SPRG0"}, \ + {0x28, "H_SET_DABR"}, \ + {0x2c, "H_PAGE_INIT"}, \ + {0x30, "H_SET_ASR"},\ + {0x34, "H_ASR_ON"}, \ + {0x38, "H_ASR_OFF"},\ + {0x3c, "H_LOGICAL_CI_LOAD"},\ + {0x40, "H_LOGICAL_CI_STORE"}, \ + {0x44, "H_LOGICAL_CACHE_LOAD"}, \ + {0x48, "H_LOGICAL_CACHE_STORE"},\ + {0x4c, "H_LOGICAL_ICBI"}, \ + {0x50, "H_LOGICAL_DCBF"}, \ + {0x54, "H_GET_TERM_CHAR"}, \ + {0x58, "H_PUT_TERM_CHAR"}, \ + {0x5c, "H_REAL_TO_LOGICAL"},\ + {0x60, "H_HYPERVISOR_DATA"},\ + {0x64, "H_EOI"},\ + {0x68, "H_CPPR"}, \ + {0x6c, "H_IPI"},\ + {0x70, "H_IPOLL"}, \ + {0x74, "H_XIRR"}, \ + {0x78, "H_MIGRATE_DMA"},\ + {0x7c, "H_PERFMON"},\ + {0xdc, "H_REGISTER_VPA"}, \ + {0xe0, "H_CEDE"}, \ + {0xe4, "H_CONFER"}, \ + {0xe8, "H_PROD"}, \ + {0xec, "H_GET_PPP"},\ + {0xf0, "H_SET_PPP"
[PATCH 12/16] perf kvm/{x86, s390}: Remove dependency on uapi/kvm_perf.h
From: Hemant Kumar Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic discovery of kvm events (if its needed). To do this, some extern variables have been introduced with which we can keep the generic functions generic. Signed-off-by: Hemant Kumar Acked-by: Alexander Yarygin Acked-by: David Ahern Cc: Michael Ellerman Cc: Naveen N. Rao Cc: Paul Mackerras Cc: Scott Wood Cc: Srikar Dronamraju Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1453962787-15376-1-git-send-email-hem...@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/arch/s390/util/kvm-stat.c | 8 +++- tools/perf/arch/x86/util/kvm-stat.c | 14 +++--- tools/perf/builtin-kvm.c | 20 ++-- tools/perf/util/kvm-stat.h | 5 + 4 files changed, 33 insertions(+), 14 deletions(-) diff --git a/tools/perf/arch/s390/util/kvm-stat.c b/tools/perf/arch/s390/util/kvm-stat.c index a5dbc07ec9dc..b85a94b19c25 100644 --- a/tools/perf/arch/s390/util/kvm-stat.c +++ b/tools/perf/arch/s390/util/kvm-stat.c @@ -10,7 +10,7 @@ */ #include "../../util/kvm-stat.h" -#include +#include define_exit_reasons_table(sie_exit_reasons, sie_intercept_code); define_exit_reasons_table(sie_icpt_insn_codes, icpt_insn_codes); @@ -18,6 +18,12 @@ define_exit_reasons_table(sie_sigp_order_codes, sigp_order_codes); define_exit_reasons_table(sie_diagnose_codes, diagnose_codes); define_exit_reasons_table(sie_icpt_prog_codes, icpt_prog_codes); +const char *vcpu_id_str = "id"; +const int decode_str_len = 40; +const char *kvm_exit_reason = "icptcode"; +const char *kvm_entry_trace = "kvm:kvm_s390_sie_enter"; +const char *kvm_exit_trace = "kvm:kvm_s390_sie_exit"; + static void event_icpt_insn_get_key(struct perf_evsel *evsel, struct perf_sample *sample, struct event_key *key) diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c index 14e4e668fad7..babefda4c862 100644 --- a/tools/perf/arch/x86/util/kvm-stat.c +++ b/tools/perf/arch/x86/util/kvm-stat.c @@ -1,5 +1,7 @@ #include "../../util/kvm-stat.h" -#include +#include +#include +#include define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS); define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS); @@ -11,6 +13,12 @@ static struct kvm_events_ops exit_events = { .name = "VM-EXIT" }; +const char *vcpu_id_str = "vcpu_id"; +const int decode_str_len = 20; +const char *kvm_exit_reason = "exit_reason"; +const char *kvm_entry_trace = "kvm:kvm_entry"; +const char *kvm_exit_trace = "kvm:kvm_exit"; + /* * For the mmio events, we treat: * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry @@ -65,7 +73,7 @@ static void mmio_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, struct event_key *key, char *decode) { - scnprintf(decode, DECODE_STR_LEN, "%#lx:%s", + scnprintf(decode, decode_str_len, "%#lx:%s", (unsigned long)key->key, key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R"); } @@ -109,7 +117,7 @@ static void ioport_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, struct event_key *key, char *decode) { - scnprintf(decode, DECODE_STR_LEN, "%#llx:%s", + scnprintf(decode, decode_str_len, "%#llx:%s", (unsigned long long)key->key, key->info ? "POUT" : "PIN"); } diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 4418d9214872..ab5645cf39d2 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -30,7 +30,6 @@ #include #ifdef HAVE_KVM_STAT_SUPPORT -#include #include "util/kvm-stat.h" void exit_event_get_key(struct perf_evsel *evsel, @@ -38,12 +37,12 @@ void exit_event_get_key(struct perf_evsel *evsel, struct event_key *key) { key->info = 0; - key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON); + key->key = perf_evsel__intval(evsel, sample, kvm_exit_reason); } bool kvm_exit_event(struct perf_evsel *evsel) { - return !strcmp(evsel->name, KVM_EXIT_TRACE); + return !strcmp(evsel->name, kvm_exit_trace); } bool exit_event_begin(struct perf_evsel *evsel, @@ -59,7 +58,7 @@ bool exit_event_begin(struct perf_evsel *evsel, bool kvm_entry_event(struct perf_evsel *evsel) { - return !strcmp(evsel->name, KVM_ENTRY_TRACE); + return !strcmp(evsel->name, kvm_entry_trace); } bool exit_event_end(struct perf_evsel *evsel, @@ -91,7 +90,7 @@ void exit_event_decode_key(struct perf_kvm_stat *kvm, const char *exit_reason = get_exit_reason(kvm, key->exit_reasons, key->key); - scnprintf(decode, DECODE_STR_LEN, "%s", exit_reason); +
[PATCH 13/16] perf kvm/{x86,s390}: Remove const from kvm_events_tp
See http://www.infradead.org/rpr.html From: Hemant Kumar This patch removes the "const" qualifier from kvm_events_tp declaration to account for the fact that some architectures may need to update this variable dynamically. For instance, powerpc will need to update this variable dynamically depending on the machine type. Signed-off-by: Hemant Kumar Acked-by: David Ahern Cc: Alexander Yarygin Cc: Michael Ellerman Cc: Naveen N. Rao Cc: Paul Mackerras Cc: Scott Wood Cc: Srikar Dronamraju Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1453962787-15376-2-git-send-email-hem...@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/arch/s390/util/kvm-stat.c | 2 +- tools/perf/arch/x86/util/kvm-stat.c | 2 +- tools/perf/util/kvm-stat.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/arch/s390/util/kvm-stat.c b/tools/perf/arch/s390/util/kvm-stat.c index b85a94b19c25..ed57df2e6d68 100644 --- a/tools/perf/arch/s390/util/kvm-stat.c +++ b/tools/perf/arch/s390/util/kvm-stat.c @@ -79,7 +79,7 @@ static struct kvm_events_ops exit_events = { .name = "VM-EXIT" }; -const char * const kvm_events_tp[] = { +const char *kvm_events_tp[] = { "kvm:kvm_s390_sie_enter", "kvm:kvm_s390_sie_exit", "kvm:kvm_s390_intercept_instruction", diff --git a/tools/perf/arch/x86/util/kvm-stat.c b/tools/perf/arch/x86/util/kvm-stat.c index babefda4c862..b63d4be655a2 100644 --- a/tools/perf/arch/x86/util/kvm-stat.c +++ b/tools/perf/arch/x86/util/kvm-stat.c @@ -129,7 +129,7 @@ static struct kvm_events_ops ioport_events = { .name = "IO Port Access" }; -const char * const kvm_events_tp[] = { +const char *kvm_events_tp[] = { "kvm:kvm_entry", "kvm:kvm_exit", "kvm:kvm_mmio", diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h index dd55548ef66a..c965dc844df3 100644 --- a/tools/perf/util/kvm-stat.h +++ b/tools/perf/util/kvm-stat.h @@ -133,7 +133,7 @@ bool kvm_entry_event(struct perf_evsel *evsel); */ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid); -extern const char * const kvm_events_tp[]; +extern const char *kvm_events_tp[]; extern struct kvm_reg_events_ops kvm_reg_events_ops[]; extern const char * const kvm_skip_events[]; extern const char *vcpu_id_str; -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 14/16] perf kvm/powerpc: Port perf kvm stat to powerpc
See http://www.infradead.org/rpr.html From: Hemant Kumar perf kvm can be used to analyze guest exit reasons. This support already exists in x86. Hence, porting it to powerpc. - To trace KVM events : perf kvm stat record If many guests are running, we can track for a specific guest by using --pid as in : perf kvm stat record --pid - To see the results : perf kvm stat report The result shows the number of exits (from the guest context to host/hypervisor context) grouped by their respective exit reasons with their frequency. Since, different powerpc machines have different KVM tracepoints, this patch discovers the available tracepoints dynamically and accordingly looks for them. If any single tracepoint is not present, this support won't be enabled for reporting. To record, this will fail if any of the events we are looking to record isn't available. Right now, its only supported on PowerPC Book3S_HV architectures. To analyze the different exits, group them and present them (in a slight descriptive way) to the user, we need a mapping between the "exit code" (dumped in the kvm_guest_exit tracepoint data) and to its related Interrupt vector description (exit reason). This patch adds this mapping in book3s_hv_exits.h. It records on two available KVM tracepoints for book3s_hv: "kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter". Here is a sample o/p: # pgrep qemu 19378 60515 2 Guests are running on the host. # perf kvm stat record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ] # perf kvm stat report -p 60515 Analyze events for pid(s) 60515, all VCPUs: VM-EXIT Samples Samples% Time% MinTimeMaxTime Avg time SYSCALL 9141 63.67% 7.49% 1.26us 5782.39us9.87us (+- 6.46%) H_DATA_STORAGE 4114 28.66% 5.07% 1.72us 4597.68us 14.84us (+-20.06%) HV_DECREMENTER 418 2.91% 4.26% 0.70us 30002.22us 122.58us (+-70.29%) EXTERNAL 392 2.73% 0.06% 0.64us104.10us1.94us (+-18.83%) RETURN_TO_HOST 287 2.00% 83.11% 1.53us 124240.15us 3486.52us (+-16.81%) H_INST_STORAGE 5 0.03% 0.00% 1.88us 3.73us2.39us (+-14.20%) Total Samples:14357, Total events handled time:1203918.42us. Signed-off-by: Hemant Kumar Cc: Alexander Yarygin Cc: David Ahern Cc: Michael Ellerman Cc: Naveen N. Rao Cc: Paul Mackerras Cc: Scott Wood Cc: Srikar Dronamraju Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1453962787-15376-3-git-send-email-hem...@linux.vnet.ibm.com Signed-off-by: Srikar Dronamraju Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/arch/powerpc/Makefile | 2 + tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/book3s_hv_exits.h | 33 tools/perf/arch/powerpc/util/kvm-stat.c| 107 + tools/perf/builtin-kvm.c | 18 + tools/perf/util/kvm-stat.h | 1 + 6 files changed, 162 insertions(+) create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile index 7fbca175099e..9f9cea3478fd 100644 --- a/tools/perf/arch/powerpc/Makefile +++ b/tools/perf/arch/powerpc/Makefile @@ -1,3 +1,5 @@ ifndef NO_DWARF PERF_HAVE_DWARF_REGS := 1 endif + +HAVE_KVM_STAT_SUPPORT := 1 diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build index 7b8b0d1a1b62..c8fe2074d217 100644 --- a/tools/perf/arch/powerpc/util/Build +++ b/tools/perf/arch/powerpc/util/Build @@ -1,5 +1,6 @@ libperf-y += header.o libperf-y += sym-handling.o +libperf-y += kvm-stat.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_DWARF) += skip-callchain-idx.o diff --git a/tools/perf/arch/powerpc/util/book3s_hv_exits.h b/tools/perf/arch/powerpc/util/book3s_hv_exits.h new file mode 100644 index ..e68ba2da8970 --- /dev/null +++ b/tools/perf/arch/powerpc/util/book3s_hv_exits.h @@ -0,0 +1,33 @@ +#ifndef ARCH_PERF_BOOK3S_HV_EXITS_H +#define ARCH_PERF_BOOK3S_HV_EXITS_H + +/* + * PowerPC Interrupt vectors : exit code to name mapping + */ + +#define kvm_trace_symbol_exit \ + {0x0, "RETURN_TO_HOST"}, \ + {0x100, "SYSTEM_RESET"}, \ + {0x200, "MACHINE_CHECK"}, \ + {0x300, "DATA_STORAGE"}, \ + {0x380, "DATA_SEGMENT"}, \ + {0x400, "INST_STORAGE"}, \ + {0x480, "INST_SEGMENT"}, \ + {0x500, "EXTERNAL"}, \ + {0x501, "EXTERNAL_LEVEL"}, \ + {0x502, "EXTERNAL_HV"}, \ + {0x600, "ALIGNMENT"}, \ + {0x700, "PROGRAM"}, \ + {0x800, "FP_UNAVAIL"}, \ + {0x900, "DECREMENTER"}, \ + {0x980, "HV_DECREMENTER"}, \ + {0xc00, "SYSCALL"}, \ + {0xd00, "TRACE"}, \ + {0xe00, "H_DATA_STORAGE"}, \ + {0xe20, "H_INST_STORAGE"}, \ + {0xe40, "H_EMUL_ASSIST"}, \ + {0xf00, "PE
[GIT PULL 00/16] perf/core improvements and fixes
See http://www.infradead.org/rpr.html Hi Ingo, This is on top of the previously submitted perf-core-for-mingo tag, please consider applying, - Arnaldo The following changes since commit 5ac76283b32b116c58e362e99542182ddcfc8262: perf cpumap: Auto initialize cpu__max_{node,cpu} (2016-01-26 16:08:36 -0300) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-2 for you to fetch changes up to 814568db641f6587c1e98a3a85f214cb6a30fe10: perf build: Align the names of the build tests: (2016-01-29 17:51:04 -0300) New features: - Port 'perf kvm stat' to PowerPC (Hemant Kumar) Infrastructure: - Use the 'feature-dump' target to do the feature checks just once and then add code to reuse that in the tests/make makefile, speeding up the 'make -C tools/perf build-test' target (Wang Nan) - Reduce the number of tests the 'build-test' target do to those that don't pollute the source tree (Arnaldo Carvalho de Melo) - Improve the output of the build tests a bit by aligning the name of the tests, more can be done to filter out uninteresting info in the output (Arnaldo Carvalho de Melo) - Add perf_evlist pointer to *info_priv_size(), more prep work for supporting the coresight architecture (Mathieu Poirier) - Improve the 'perf test bp_signal' test (Wang Nan) - Check environment before starting the BPF 'perf test', so that we can just 'Skip' older kernels instead of 'FAIL'ing them (Wang Nan) - Fix cpumode of synthesized buildid event (Wang Nan) Signed-off-by: Arnaldo Carvalho de Melo Arnaldo Carvalho de Melo (2): perf tools: Speed up build-tests by reducing the number of builds tested perf build: Align the names of the build tests: Hemant Kumar (4): perf kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h perf kvm/{x86,s390}: Remove const from kvm_events_tp perf kvm/powerpc: Port perf kvm stat to powerpc perf kvm/powerpc: Add support for HCALL reasons Jiri Olsa (1): perf build: Fix feature-dump checks, we need to test all features Mathieu Poirier (1): perf auxtrace: Add perf_evlist pointer to *info_priv_size() Wang Nan (8): tools build: Check basic headers for test-compile feature checker perf build: Remove all condition feature check {C,LD}FLAGS perf build: Use feature dump file for build-test perf buildid: Fix cpumode of buildid event perf test: Check environment before start real BPF test perf test: Improve bp_signal perf tools: Move timestamp creation to util perf record: Use OPT_BOOLEAN_SET for buildid cache related options tools/build/Makefile.feature | 8 ++ tools/build/feature/test-compile.c | 2 + tools/perf/Makefile| 11 +- tools/perf/arch/powerpc/Makefile | 2 + tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 ++ tools/perf/arch/powerpc/util/book3s_hv_exits.h | 33 + tools/perf/arch/powerpc/util/kvm-stat.c| 170 + tools/perf/arch/s390/util/kvm-stat.c | 10 +- tools/perf/arch/x86/util/intel-bts.c | 4 +- tools/perf/arch/x86/util/intel-pt.c| 4 +- tools/perf/arch/x86/util/kvm-stat.c| 16 ++- tools/perf/builtin-buildid-cache.c | 14 +- tools/perf/builtin-kvm.c | 38 -- tools/perf/builtin-record.c| 12 +- tools/perf/config/Makefile | 101 +++ tools/perf/tests/bp_signal.c | 140 tools/perf/tests/bpf.c | 37 ++ tools/perf/tests/make | 39 +- tools/perf/util/auxtrace.c | 7 +- tools/perf/util/auxtrace.h | 6 +- tools/perf/util/build-id.c | 6 +- tools/perf/util/kvm-stat.h | 8 +- tools/perf/util/util.c | 17 +++ tools/perf/util/util.h | 1 + 25 files changed, 688 insertions(+), 122 deletions(-) create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
- Original Message - > On 2016/1/29 6:46, Alex Williamson wrote: > > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote: > >> MSI-X tables are not allowed to be mmapped in vfio-pci > >> driver in case that user get to touch this directly. > >> This will cause some performance issues when when PCI > >> adapters have critical registers in the same page as > >> the MSI-X table. > >> > >> However, some kind of PCI host bridge such as IODA bridge > >> on Power support filtering of MSIs, which can ensure that a > >> given pci device can only shoot the MSIs assigned for it. > >> So we think it's safe to expose the MSI-X table to userspace > >> if filtering of MSIs is supported because the exposed MSI-X > >> table can't be used to do harm to other memory space. > >> > >> To support this case, this patch adds a pci_host_bridge > >> attribute to indicate if this PCI host bridge supports > >> filtering of MSIs. > >> > >> Signed-off-by: Yongji Xie > >> --- > >> drivers/pci/host-bridge.c |6 ++ > >> include/linux/pci.h |3 +++ > >> 2 files changed, 9 insertions(+) > >> > >> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c > >> index 5f4a2e0..c029267 100644 > >> --- a/drivers/pci/host-bridge.c > >> +++ b/drivers/pci/host-bridge.c > >> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct > >> resource *res, > >>res->end = region->end + offset; > >> } > >> EXPORT_SYMBOL(pcibios_bus_to_resource); > >> + > >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev) > >> +{ > >> + return pci_find_host_bridge(pdev->bus)->msi_filtered; > >> +} > >> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled); > >> diff --git a/include/linux/pci.h b/include/linux/pci.h > >> index b640d65..b952b78 100644 > >> --- a/include/linux/pci.h > >> +++ b/include/linux/pci.h > >> @@ -412,6 +412,7 @@ struct pci_host_bridge { > >>void (*release_fn)(struct pci_host_bridge *); > >>void *release_data; > >>unsigned int ignore_reset_delay:1; /* for entire hierarchy */ > >> + unsigned int msi_filtered:1;/* support filtering of MSIs */ > >>/* Resource alignment requirements */ > >>resource_size_t (*align_resource)(struct pci_dev *dev, > >>const struct resource *res, > >> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct > >> pci_host_bridge *bridge, > >> > >> int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge); > >> > >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev); > >> + > >> /* > >>* The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that > >>correspond > >>* to P2P or CardBus bridge windows) go in a table. Additional ones > >>(for > > Don't we already have a flag for this in the IOMMU space? > > > > enum iommu_cap { > > IOMMU_CAP_CACHE_COHERENCY, /* IOMMU can enforce cache > > coherent DMA > > transactions */ > > --->IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt > > isolation */ > > IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ > > }; > > > > I saw this flag had been enabled in x86 and ARM arch. > > I'm not sure whether we can mmap MSI-X table in those archs. I just > verify it on PPC64 arch. Unfortunately that's not a very good excuse for creating an alternate implementation. When x86 implements interrupt remapping, we get fine grained isolation of MSI vectors and we've always taken this flag to mean that the system is isolated from devices that may perform DoS attacks with MSI writes. I'm not entirely sure whether ARM really provides that degree of isolation, but they would be incorrect is exposing the capability if they do not. Thanks, Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
On Fri, 2016-01-29 at 18:37 +0800, Yongji Xie wrote: > On 2016/1/29 6:46, Alex Williamson wrote: > > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote: > > > When vfio passthrough a PCI device of which MMIO BARs > > > are smaller than PAGE_SIZE, guest will not handle the > > > mmio accesses to the BARs which leads to mmio emulations > > > in host. > > > > > > This is because vfio will not allow to passthrough one > > > BAR's mmio page which may be shared with other BARs. > > > > > > To solve this performance issue, this patch adds a kernel > > > parameter "pci=resource_page_aligned=on" to enforce > > > the alignment of all MMIO BARs to be at least PAGE_SIZE, > > > so that one BAR's mmio page would not be shared with other > > > BARs. We can also disable it through kernel parameter > > > "pci=resource_page_aligned=off". > > > > > > For the default value of the parameter, we think it should be > > > arch-independent, so we add a macro > > > HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we > > > define this macro to enable this parameter by default on PPC64 > > > platform which can easily hit this performance issue because > > > its PAGE_SIZE is 64KB. > > > > > > Note that the kernel parameter won't works if kernel doesn't do > > > resources reallocation. > > And where do you account for this so that we know whether it's really in > > effect? > > We can check the flag PCI_PROBE_ONLY to know whether kernel do > resources reallocation. Then we know if the kernel parameter is really > in effect. > > enum { > /* Force re-assigning all resources (ignore firmware > * setup completely) > */ > PCI_REASSIGN_ALL_RSRC= 0x0001, > > /* Re-assign all bus numbers */ > PCI_REASSIGN_ALL_BUS= 0x0002, > > /* Do not try to assign, just use existing setup */ > --->PCI_PROBE_ONLY= 0x0004, > > And I will add this to commit log. We need more than a commit log entry for this, what's the purpose of the pci_resources_share_page() function if we don't know if this is in effect? > > > Signed-off-by: Yongji Xie > > > --- > > > Documentation/kernel-parameters.txt |5 + > > > arch/powerpc/include/asm/pci.h | 11 +++ > > > drivers/pci/pci.c | 35 > > > +++ > > > drivers/pci/pci.h |8 +++- > > > include/linux/pci.h |4 > > > 5 files changed, 62 insertions(+), 1 deletion(-) > > > > > > diff --git a/Documentation/kernel-parameters.txt > > > b/Documentation/kernel-parameters.txt > > > index 742f69d..3f2a7c9 100644 > > > --- a/Documentation/kernel-parameters.txt > > > +++ b/Documentation/kernel-parameters.txt > > > @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also > > > be entirely omitted. > > > PAGE_SIZE is used as alignment. > > > PCI-PCI bridge can be specified, if > > > resource > > > windows need to be expanded. > > > + resource_page_aligned= Enable/disable enforcing the alignment > > > + of all PCI devices' memory resources to be > > > + at least PAGE_SIZE if resources reallocation > > > + is done by kernel. > > > + Format: { "on" | "off" } > > > ecrc= Enable/disable PCIe ECRC (transaction > > > layer > > > end-to-end CRC checking). > > > bios: Use BIOS/firmware settings. This > > > is the > > > diff --git a/arch/powerpc/include/asm/pci.h > > > b/arch/powerpc/include/asm/pci.h > > > index 3453bd8..2d2b3ef 100644 > > > --- a/arch/powerpc/include/asm/pci.h > > > +++ b/arch/powerpc/include/asm/pci.h > > > @@ -136,6 +136,17 @@ extern pgprot_t pci_phys_mem_access_prot(struct > > > file *file, > > > unsigned long pfn, > > > unsigned long size, > > > pgprot_t prot); > > > +#ifdef CONFIG_PPC64 > > > + > > > +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned > > > + * by default. This would be helpful to improve performance > > > + * when we passthrough a PCI device of which BARs are smaller > > > + * than PAGE_SIZE(64KB). And we can use kernel parameter > > > + * "pci=resource_page_aligned=off" to disable it. > > > + */ > > > +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED 1 > > > + > > > +#endif > > > > > > #define HAVE_ARCH_PCI_RESOURCE_TO_USER > > > extern void pci_resource_to_user(const struct pci_dev *dev, int bar, > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 314db8c..7b21238 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -99,6 +99,9 @@ u8 pci_cache_line_size; > > > */ > >
[PATCH] powerpc/book3s_32: Fix build error with checkpoint restart
In file included from mm/vmscan.c:54:0: include/linux/swapops.h: In function ‘pte_to_swp_entry’: include/linux/swapops.h:69:2: error: implicit declaration of function ‘pte_swp_soft_dirty’ [-Werror=implicit-function-declaration] if (pte_swp_soft_dirty(pte)) ^ include/linux/swapops.h:70:3: error: implicit declaration of function ‘pte_swp_clear_soft_dirty’ [-Werror=implicit-function-declaration] pte = pte_swp_clear_soft_dirty(pte); We support soft dirty tracking only with book3s 64 for now. So change the Kconfig dependency accordingly. Also CHECKPOINT_RESTORE feature is not really dependent on SOFT_DIRTY. We track the dependency between MEM_SOFT_DIRTY and ARCH_SOFT_DIRTY through headers Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 09b94174d372..599329332613 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -558,7 +558,7 @@ choice config PPC_4K_PAGES bool "4k page size" - select HAVE_ARCH_SOFT_DIRTY if CHECKPOINT_RESTORE && PPC_BOOK3S + select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64 config PPC_16K_PAGES bool "16k page size" @@ -567,7 +567,7 @@ config PPC_16K_PAGES config PPC_64K_PAGES bool "64k page size" depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64) - select HAVE_ARCH_SOFT_DIRTY if CHECKPOINT_RESTORE && PPC_BOOK3S + select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64 config PPC_256K_PAGES bool "256k page size" -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On Thu, 28 Jan 2016, Leonid Yegoshin wrote: > In http://patchwork.linux-mips.org/patch/10505/ the very last mesg exchange > is: [...] > ... and that stops forever... Thanks for the reminder -- last June was very hectic, I travelled a lot and I lost the discussion from my radar. Apologies for that. I replied in that thread now with my results. I hope this helps. Maciej ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree of bsc9132qds
> -Original Message- > From: Scott Wood [mailto:o...@buserror.net] > Sent: 2016年1月27日 22:24 > To: Zhiqiang Hou ; Zhiqiang Hou > ; linuxppc-dev@lists.ozlabs.org; > ga...@kernel.crashing.org; b...@kernel.crashing.org; pau...@samba.org; > m...@ellerman.id.au; devicet...@vger.kernel.org; robh...@kernel.org; > pawel.m...@arm.com; mark.rutl...@arm.com; ijc+devicet...@hellion.org.uk; > Harninder Rai ; r...@kernel.org > Cc: Lian M.H. ; Hu Vincent > ; Hou Zhiqiang > Subject: Re: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree of > bsc9132qds > > On Wed, 2016-01-27 at 06:47 +, Zhiqiang Hou wrote: > > Hi Herring and Kumar and Ian, > > > > Can you help to apply this patch? > > > > Thanks, > > Zhiqiang > > Can you check whether a patch has already been applied before pinging people > about it? > Sorry, I only checked the state of this patchset on web. Thanks, Zhiqiang > > > > > > -Original Message- > > > From: Zhiqiang Hou [mailto:zhiqiang@nxp.com] > > > Sent: 2015年12月22日 17:28 > > > To: Zhiqiang Hou ; > > > linuxppc-dev@lists.ozlabs.org; Scott Wood ; > > > ga...@kernel.crashing.org; b...@kernel.crashing.org; > > > pau...@samba.org; m...@ellerman.id.au; devicet...@vger.kernel.org; > > > robh...@kernel.org; pawel.m...@arm.com; mark.rutl...@arm.com; > > > ijc+devicet...@hellion.org.uk; Harninder Rai > > > > > > Cc: Lian M.H. ; Hu Vincent > > > ; Hou Zhiqiang > > > Subject: RE: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree > > > of bsc9132qds > > > > > > Hi Rob, > > > > > > Could you please take this patch into account? > > > > > > Thanks, > > > Zhiqiang > > > > > > > -Original Message- > > > > From: Zhiqiang Hou [mailto:zhiqiang@freescale.com] > > > > Sent: 2015年11月5日 11:16 > > > > To: linuxppc-dev@lists.ozlabs.org; Scott Wood; > > > > ga...@kernel.crashing.org; b...@kernel.crashing.org; > > > > pau...@samba.org; m...@ellerman.id.au; devicet...@vger.kernel.org; > > > > robh...@kernel.org; pawel.m...@arm.com; mark.rutl...@arm.com; > > > > ijc+devicet...@hellion.org.uk; Harninder Rai > > > > Cc: Minghuan Lian; Mingkai Hu; Zhiqiang Hou > > > > Subject: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree > > > > of bsc9132qds > > > > > > > > From: Harninder Rai > > > > > > > > Signed-off-by: Harninder Rai > > > > Signed-off-by: Minghuan Lian > > > > Signed-off-by: Hou Zhiqiang > > > > --- > > > > V4: V3: > > > > - Remove gerrit stuff. > > > > V2: > > > > - Remove property clock-frequency. > > > > > > > > arch/powerpc/boot/dts/bsc9132qds.dts | 15 ++ > > > > arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 28 > > > > +++ > > > > +++ arch/powerpc/boot/dts/fsl/bsc9132si-pr > > > > +++ e.dt > > > > +++ si > > > > > 1 + > > > > 3 files changed, 44 insertions(+) > > > > > > > > diff --git a/arch/powerpc/boot/dts/bsc9132qds.dts > > > > b/arch/powerpc/boot/dts/bsc9132qds.dts > > > > index 6cab106..940d719 100644 > > > > --- a/arch/powerpc/boot/dts/bsc9132qds.dts > > > > +++ b/arch/powerpc/boot/dts/bsc9132qds.dts > > > > @@ -29,6 +29,21 @@ > > > > soc: soc@ff70 { > > > > ranges = <0x0 0x0 0xff70 0x10>; }; > > > > + > > > > +pci0: pcie@ff70a000 { > > > > +reg = <0 0xff70a000 0 0x1000>; > > > > +ranges = <0x200 0x0 0x9000 0 0x9000 0x0 > > > > 0x2000 > > > > + 0x100 0x0 0x 0 0xc001 0x0 0x1>; > > > > +pcie@0 { > > > > +ranges = <0x200 0x0 0x9000 > > > > + 0x200 0x0 0x9000 > > > > + 0x0 0x2000 > > > > + > > > > + 0x100 0x0 0x0 > > > > + 0x100 0x0 0x0 > > > > + 0x0 0x10>; > > > > +}; > > > > +}; > > > > }; > > > > > > > > /include/ "bsc9132qds.dtsi" > > > > diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi > > > > b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi > > > > index c723071..b5f0715 100644 > > > > --- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi > > > > +++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi > > > > @@ -40,6 +40,34 @@ > > > > interrupts = <16 2 0 0 20 2 0 0>; }; > > > > > > > > +/* controller at 0xa000 */ > > > > +&pci0 { > > > > +compatible = "fsl,bsc9132-pcie", "fsl,qoriq-pcie-v2.2"; > > > > +device_type = "pci"; #size-cells = <2>; #address-cells = <3>; > > > > +bus-range = <0 255>; interrupts = <16 2 0 0>; > > > > + > > > > +pcie@0 { > > > > +reg = <0 0 0 0 0>; > > > > +#interrupt-cells = <1>; > > > > +#size-cells = <2>; > > > > +#address-cells = <3>; > > > > +device_type = "pci"; > > > > +interrupts = <16 2 0 0>; > > > > +interrupt-map-mask = <0xf800 0 0 7>; > > > > + > > > > +interrupt-map = < > > > > +/* IDSEL 0x0 */ > > > > + 0x0 0x0 0x1 &mpic 0x0 0x2 0x0 0x0 > > > > + 0x0 0x0 0x2 &mpic 0x1 0x2 0x0 0x0 > > > > + 0x0 0x0 0x3 &mpic 0x2 0x2 0x0 0x0 > > > > + 0x0 0x0 0x4 &mpic 0x3 0x2 0x0 0x0 > > > > +>; > > > > +}; > > > > +}; > > > > + > > > > &soc { > > > > #address-cells = <1>; > > > > #size-cells
Re: [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs
On 2016/1/29 6:46, Alex Williamson wrote: On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote: Current vfio-pci implementation disallows to mmap MSI-X table in case that user get to touch this directly. But we should allow to mmap these MSI-X tables if the PCI host bridge supports filtering of MSIs. Signed-off-by: Yongji Xie --- drivers/vfio/pci/vfio_pci.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 11fd0f0..4d68f6a 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data, IORESOURCE_MEM && !pci_resources_share_page(pdev, info.index)) { info.flags |= VFIO_REGION_INFO_FLAG_MMAP; - if (info.index == vdev->msix_bar) { + if (!pci_host_bridge_msi_filtered_enabled(pdev) && + info.index == vdev->msix_bar) { ret = msix_sparse_mmap_cap(vdev, &caps); if (ret) return ret; @@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { + if (!pci_host_bridge_msi_filtered_enabled(pdev) && + index == vdev->msix_bar) { /* * Disallow mmaps overlapping the MSI-X table; users don't * get to touch this directly. We could find somewhere What about read()/write() access, why would we allow mmap() but not those? Yes, you are right! I miss the MSI-X table check in vfio_pci_bar_rw(). I will fix it in next version. Thanks. Regards, Yongji Xie ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
On 2016/1/29 6:46, Alex Williamson wrote: On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote: MSI-X tables are not allowed to be mmapped in vfio-pci driver in case that user get to touch this directly. This will cause some performance issues when when PCI adapters have critical registers in the same page as the MSI-X table. However, some kind of PCI host bridge such as IODA bridge on Power support filtering of MSIs, which can ensure that a given pci device can only shoot the MSIs assigned for it. So we think it's safe to expose the MSI-X table to userspace if filtering of MSIs is supported because the exposed MSI-X table can't be used to do harm to other memory space. To support this case, this patch adds a pci_host_bridge attribute to indicate if this PCI host bridge supports filtering of MSIs. Signed-off-by: Yongji Xie --- drivers/pci/host-bridge.c |6 ++ include/linux/pci.h |3 +++ 2 files changed, 9 insertions(+) diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c index 5f4a2e0..c029267 100644 --- a/drivers/pci/host-bridge.c +++ b/drivers/pci/host-bridge.c @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res, res->end = region->end + offset; } EXPORT_SYMBOL(pcibios_bus_to_resource); + +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev) +{ + return pci_find_host_bridge(pdev->bus)->msi_filtered; +} +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled); diff --git a/include/linux/pci.h b/include/linux/pci.h index b640d65..b952b78 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -412,6 +412,7 @@ struct pci_host_bridge { void (*release_fn)(struct pci_host_bridge *); void *release_data; unsigned int ignore_reset_delay:1; /* for entire hierarchy */ + unsigned int msi_filtered:1;/* support filtering of MSIs */ /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, const struct resource *res, @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge, int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge); +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev); + /* * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond * to P2P or CardBus bridge windows) go in a table. Additional ones (for Don't we already have a flag for this in the IOMMU space? enum iommu_cap { IOMMU_CAP_CACHE_COHERENCY, /* IOMMU can enforce cache coherent DMA transactions */ --->IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ }; I saw this flag had been enabled in x86 and ARM arch. I'm not sure whether we can mmap MSI-X table in those archs. I just verify it on PPC64 arch. Regards. Yongji Xie ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
On 2016/1/29 6:46, Alex Williamson wrote: On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote: When vfio passthrough a PCI device of which MMIO BARs are smaller than PAGE_SIZE, guest will not handle the mmio accesses to the BARs which leads to mmio emulations in host. This is because vfio will not allow to passthrough one BAR's mmio page which may be shared with other BARs. To solve this performance issue, this patch adds a kernel parameter "pci=resource_page_aligned=on" to enforce the alignment of all MMIO BARs to be at least PAGE_SIZE, so that one BAR's mmio page would not be shared with other BARs. We can also disable it through kernel parameter "pci=resource_page_aligned=off". For the default value of the parameter, we think it should be arch-independent, so we add a macro HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we define this macro to enable this parameter by default on PPC64 platform which can easily hit this performance issue because its PAGE_SIZE is 64KB. Note that the kernel parameter won't works if kernel doesn't do resources reallocation. And where do you account for this so that we know whether it's really in effect? We can check the flag PCI_PROBE_ONLY to know whether kernel do resources reallocation. Then we know if the kernel parameter is really in effect. enum { /* Force re-assigning all resources (ignore firmware * setup completely) */ PCI_REASSIGN_ALL_RSRC= 0x0001, /* Re-assign all bus numbers */ PCI_REASSIGN_ALL_BUS= 0x0002, /* Do not try to assign, just use existing setup */ --->PCI_PROBE_ONLY= 0x0004, And I will add this to commit log. Signed-off-by: Yongji Xie --- Documentation/kernel-parameters.txt |5 + arch/powerpc/include/asm/pci.h | 11 +++ drivers/pci/pci.c | 35 +++ drivers/pci/pci.h |8 +++- include/linux/pci.h |4 5 files changed, 62 insertions(+), 1 deletion(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 742f69d..3f2a7c9 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. PAGE_SIZE is used as alignment. PCI-PCI bridge can be specified, if resource windows need to be expanded. + resource_page_aligned= Enable/disable enforcing the alignment + of all PCI devices' memory resources to be + at least PAGE_SIZE if resources reallocation + is done by kernel. + Format: { "on" | "off" } ecrc= Enable/disable PCIe ECRC (transaction layer end-to-end CRC checking). bios: Use BIOS/firmware settings. This is the diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h index 3453bd8..2d2b3ef 100644 --- a/arch/powerpc/include/asm/pci.h +++ b/arch/powerpc/include/asm/pci.h @@ -136,6 +136,17 @@ extern pgprot_tpci_phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t prot); +#ifdef CONFIG_PPC64 + +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned + * by default. This would be helpful to improve performance + * when we passthrough a PCI device of which BARs are smaller + * than PAGE_SIZE(64KB). And we can use kernel parameter + * "pci=resource_page_aligned=off" to disable it. + */ +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED1 + +#endif #define HAVE_ARCH_PCI_RESOURCE_TO_USER extern void pci_resource_to_user(const struct pci_dev *dev, int bar, diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 314db8c..7b21238 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -99,6 +99,9 @@ u8 pci_cache_line_size; */ unsigned int pcibios_max_latency = 255; +bool pci_resources_page_aligned = + IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED); I don't think this is proper use of IS_ENABLED, which seems to be targeted at CONFIG_ type options. You could define this as that in an arch Kconfig. Is it better that we define this as a pci Kconfig and select it in arch Kconfig? + /* If set, the PCIe ARI capability will not be used. */ static bool pcie_ari_disabled; @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus, BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show, pci_resource_alignment_store); +static void pci_resources_get_page_aligned(
[GIT PULL] Please pull powerpc/linux.git powerpc-4.5-2 tag
Hi Linus, Please pull powerpc fixes for 4.5: The following changes since commit 9fa686068a32ddf256df03982b3e3967c18654a8: Merge tag 'dmaengine-fix-4.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma (2016-01-20 10:15:21 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.5-2 for you to fetch changes up to 2d19fc639516dc7b4184450b315c931d38549e61: powerpc/mm: Fixup _HPAGE_CHG_MASK (2016-01-28 23:49:43 +1100) powerpc fixes for 4.5 - Wire up copy_file_range() syscall from Chandan Rajendra - Simplify module TOC handling from Alan Modra - Remove newly added extra definition of pmd_dirty from Stephen Rothwell - Allow user space to map rtas_rmo_buf from Vasant Hegde - Fix PE location code from Gavin Shan - Remove PPMU_HAS_SSLOT flag for Power8 from Madhavan Srinivasan - Fixup _HPAGE_CHG_MASK from Aneesh Kumar K.V Alan Modra (1): powerpc: Simplify module TOC handling Aneesh Kumar K.V (1): powerpc/mm: Fixup _HPAGE_CHG_MASK Chandan Rajendra (1): powerpc: Wire up copy_file_range() syscall Gavin Shan (1): powerpc/eeh: Fix PE location code Madhavan Srinivasan (1): powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 Stephen Rothwell (1): powerpc: Remove newly added extra definition of pmd_dirty Vasant Hegde (1): powerpc/mm: Allow user space to map rtas_rmo_buf arch/powerpc/include/asm/book3s/64/hash.h| 4 +++- arch/powerpc/include/asm/book3s/64/pgtable.h | 1 - arch/powerpc/include/asm/systbl.h| 1 + arch/powerpc/include/asm/unistd.h| 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + arch/powerpc/kernel/eeh_pe.c | 33 +--- arch/powerpc/kernel/misc_64.S| 28 --- arch/powerpc/kernel/module_64.c | 12 +++--- arch/powerpc/mm/mem.c| 4 ++-- arch/powerpc/perf/power8-pmu.c | 2 +- scripts/mod/modpost.c| 3 ++- 11 files changed, 35 insertions(+), 56 deletions(-) signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] powerpc/mm: Enable HugeTLB page migration
On 01/28/2016 08:14 PM, Aneesh Kumar K.V wrote: > Anshuman Khandual writes: > >> This enables HugeTLB page migration for PPC64_BOOK3S systems which implement >> HugeTLB page at the PMD level. It enables the kernel configuration option >> CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION by default which turns on the function >> hugepage_migration_supported() during migration. After the recent changes >> to the PTE format, HugeTLB page migration happens successfully. >> >> Signed-off-by: Anshuman Khandual >> --- >> arch/powerpc/Kconfig | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index e4824fd..65d52a0 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -82,6 +82,10 @@ config GENERIC_HWEIGHT >> config ARCH_HAS_DMA_SET_COHERENT_MASK >> bool >> >> +config ARCH_ENABLE_HUGEPAGE_MIGRATION >> +def_bool y >> +depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION >> + >> config PPC >> bool >> default y > > > Are you sure this is all that is needed ? We will get a FOLL_GET with hugetlb > migration and our follow_huge_addr will BUG_ON on that. Look at > e66f17ff71772b209eed39de35aaa99ba819c93d (" mm/hugetlb: take page table > lock in follow_huge_pmd()"). HugeTLB page migration was successful without any error and data integrity check passed on them as well. But yes there might be some corner cases which trigger the race condition we have not faced yet. Will try to understand the situation there and get back. > > Again this doesn't work with 4K page size. So if you are taking this > route, we will need that restriction here. > Agreed, I had already put a comment on the thread pointing out the same. But yes, the restriction needs to be there in the enabling config option here as well. > I would suggest we switch 64K page size hugetlb to generic > hugetlb and then do hugetlb migration on top of that. Will explore it and get back. > > Till you help me understnd why that FOLL_GET issue is not valid for > powerpc, Sure will get back. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev