[PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values

2016-01-29 Thread Sukadev Bhattiprolu
From a1aa992fb25fb8e98a5c5724376ae8cc91463de3 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Mon, 25 Jan 2016 23:05:36 -0500
Subject: [PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values

For 24x7 counters, perf displays the raw value of the 24x7 counter, which
is a monotonically increasing value.

perf stat -C 0 -e \
'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
sleep 1

 Performance counter stats for 'CPU(s) 0':

 9,105,403,170  hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/

   0.000425751 seconds time elapsed

In the typical usage of 'perf stat' this counter value is not as useful
as the _change_ in the counter value over the duration of the application.

Have h_24x7_event_init() set the event's prev_count to the raw value of
the 24x7 counter at the time of initialization. When the application
terminates, hv_24x7_event_read() will compute the change in value and
report to the perf tool. Similarly, for the transaction interface, clear
the event count to 0 at the beginning of the transaction.

perf stat -C 0 -e \
'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
sleep 1

 Performance counter stats for 'CPU(s) 0':

   245,758  hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/

   1.006366383 seconds time elapsed

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/hv-24x7.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index b7a9a03..77b958f 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1222,11 +1222,12 @@ static int h_24x7_event_init(struct perf_event *event)
return -EACCES;
}
 
-   /* see if the event complains */
+   /* Get the initial value of the counter for this event */
if (single_24x7_request(event, &ct)) {
pr_devel("test hcall failed\n");
return -EIO;
}
+   (void)local64_xchg(&event->hw.prev_count, ct);
 
return 0;
 }
@@ -1289,6 +1290,16 @@ static void h_24x7_event_read(struct perf_event *event)
h24x7hw = &get_cpu_var(hv_24x7_hw);
h24x7hw->events[i] = event;
put_cpu_var(h24x7hw);
+   /*
+* Clear the event count so we can compute the _change_
+* in the 24x7 raw counter value at the end of the txn.
+*
+* Note that we could alternatively read the 24x7 value
+* now and save its value in event->hw.prev_count. But
+* that would require issuing a hcall, which would then
+* defeat the purpose of using the txn interface.
+*/
+   local64_set(&event->count, 0);
}
 
put_cpu_var(hv_24x7_reqb);
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] powerpc/perf/hv-24x7: Fix usage with chip events

2016-01-29 Thread Sukadev Bhattiprolu
From 9b5848ce1834a4d82fc251022035d36d9e26b500 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Sat, 23 Jan 2016 03:58:12 -0500
Subject: [PATCH 1/2] powerpc/perf/hv-24x7: Fix usage with chip events.

24x7 counters can belong to different domains (core, chip, virtual CPU
etc). For events in the 'chip' domain, sysfs entry currently looks like:

$ cd /sys/bus/event_source/devices/hv_24x7/events
$ cat PM_XLINK_CYCLES__PHYS_CHIP
domain=0x1,offset=0x230,core=?,lpar=0x0

where the required parameter, 'core=?' is specified with perf as:

perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,core=1/ \
/bin/true

This is inconsistent in that 'core' is a required parameter for a chip
event.  Instead, have the the sysfs entry display 'chip=?' for chip
events:

$ cd /sys/bus/event_source/devices/hv_24x7/events
$ cat PM_XLINK_CYCLES__PHYS_CHIP
domain=0x1,offset=0x230,chip=?,lpar=0x0

We also need to add a 'chip' entry in the sysfs format directory:

$ ls /sys/bus/event_source/devices/hv_24x7/format
chip  core  domain  lpar  offset  vcpu

(new)

so the perf tool can automatically check usage and format the chip
parameter correctly:

$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/ \
/bin/true
Required parameter 'chip' not specified
invalid or unsupported event: 'hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/'

$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/ \
/bin/true
hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/: 0 6628908 6628908

 Performance counter stats for 'CPU(s) 0':

 0  hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/

0.006606970 seconds time elapsed

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/hv-24x7.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9f9dfda..b7a9a03 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -101,6 +101,7 @@ static bool catalog_entry_domain_is_valid(unsigned domain)
 EVENT_DEFINE_RANGE_FORMAT(domain, config, 0, 3);
 /* u16 */
 EVENT_DEFINE_RANGE_FORMAT(core, config, 16, 31);
+EVENT_DEFINE_RANGE_FORMAT(chip, config, 16, 31);
 EVENT_DEFINE_RANGE_FORMAT(vcpu, config, 16, 31);
 /* u32, see "data_offset" */
 EVENT_DEFINE_RANGE_FORMAT(offset, config, 32, 63);
@@ -115,6 +116,7 @@ static struct attribute *format_attrs[] = {
&format_attr_domain.attr,
&format_attr_offset.attr,
&format_attr_core.attr,
+   &format_attr_chip.attr,
&format_attr_vcpu.attr,
&format_attr_lpar.attr,
NULL,
@@ -289,10 +291,16 @@ static char *event_fmt(struct hv_24x7_event_data *event, 
unsigned domain)
const char *sindex;
const char *lpar;
 
-   if (is_physical_domain(domain)) {
+   switch (domain) {
+   case HV_PERF_DOMAIN_PHYS_CHIP:
+   lpar = "0x0";
+   sindex = "chip";
+   break;
+   case HV_PERF_DOMAIN_PHYS_CORE:
lpar = "0x0";
sindex = "core";
-   } else {
+   break;
+   default:
lpar = "?";
sindex = "vcpu";
}
@@ -1089,10 +1097,16 @@ static int add_event_to_24x7_request(struct perf_event 
*event,
return -EINVAL;
}
 
-   if (is_physical_domain(event_get_domain(event)))
+   switch (event_get_domain(event)) {
+   case HV_PERF_DOMAIN_PHYS_CHIP:
+   idx = event_get_chip(event);
+   break;
+   case HV_PERF_DOMAIN_PHYS_CORE:
idx = event_get_core(event);
-   else
+   break;
+   default:
idx = event_get_vcpu(event);
+   }
 
i = request_buffer->num_requests++;
req = &request_buffer->requests[i];
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 15/16] perf kvm/powerpc: Add support for HCALL reasons

2016-01-29 Thread Arnaldo Carvalho de Melo
From: Hemant Kumar 

Powerpc provides hcall events that also provides insights into guest
behaviour. Enhance perf kvm stat to record and analyze hcall events.

 - To trace hcall events :
  perf kvm stat record

 - To show the results :
  perf kvm stat report --event=hcall

The result shows the number of hypervisor calls from the guest grouped
by their respective reasons displayed with the frequency.

This patch makes use of two additional tracepoints
"kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall
codes to their respective names, it needs a mapping. Such mapping is
added in this patch in book3s_hcalls.h.

 # pgrep qemu
A sample output :
19378
60515

2 VMs running.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515 --event=hcall

Analyze events for all VMs, all VCPUs:

HCALL-EVENT Samples Samples% Time% MinTime MaxTime  AvgTime

  H_IPI 822  66.08% 88.10% 0.63us  11.38us 2.05us (+- 1.42%)
 H_SEND_CRQ 144  11.58%  3.77% 0.41us   0.88us 0.50us (+- 1.47%)
   H_VIO_SIGNAL 118   9.49%  2.86% 0.37us   0.83us 0.47us (+- 1.43%)
H_PUT_TERM_CHAR  76   6.11%  2.07% 0.37us   0.90us 0.52us (+- 2.43%)
H_GET_TERM_CHAR  74   5.95%  2.23% 0.37us   1.70us 0.58us (+- 4.77%)
 H_RTAS   6   0.48%  0.85% 1.10us   9.25us 2.70us (+-48.57%)
  H_PERFMON   4   0.32%  0.12% 0.41us   0.96us 0.59us (+-20.92%)

Total Samples:1244, Total events handled time:1916.69us.

Signed-off-by: Hemant Kumar 
Cc: Alexander Yarygin 
Cc: David Ahern 
Cc: Michael Ellerman 
Cc: Naveen N. Rao 
Cc: Paul Mackerras 
Cc: Scott  Wood 
Cc: Srikar Dronamraju 
Cc: linuxppc-dev@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1453962787-15376-4-git-send-email-hem...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 +++
 tools/perf/arch/powerpc/util/kvm-stat.c  |  65 +-
 2 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h

diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h 
b/tools/perf/arch/powerpc/util/book3s_hcalls.h
new file mode 100644
index ..0dd6b7f2d44f
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h
@@ -0,0 +1,123 @@
+#ifndef ARCH_PERF_BOOK3S_HV_HCALLS_H
+#define ARCH_PERF_BOOK3S_HV_HCALLS_H
+
+/*
+ * PowerPC HCALL codes : hcall code to name mapping
+ */
+#define kvm_trace_symbol_hcall \
+   {0x4, "H_REMOVE"},  \
+   {0x8, "H_ENTER"},   \
+   {0xc, "H_READ"},\
+   {0x10, "H_CLEAR_MOD"},  \
+   {0x14, "H_CLEAR_REF"},  \
+   {0x18, "H_PROTECT"},\
+   {0x1c, "H_GET_TCE"},\
+   {0x20, "H_PUT_TCE"},\
+   {0x24, "H_SET_SPRG0"},  \
+   {0x28, "H_SET_DABR"},   \
+   {0x2c, "H_PAGE_INIT"},  \
+   {0x30, "H_SET_ASR"},\
+   {0x34, "H_ASR_ON"}, \
+   {0x38, "H_ASR_OFF"},\
+   {0x3c, "H_LOGICAL_CI_LOAD"},\
+   {0x40, "H_LOGICAL_CI_STORE"},   \
+   {0x44, "H_LOGICAL_CACHE_LOAD"}, \
+   {0x48, "H_LOGICAL_CACHE_STORE"},\
+   {0x4c, "H_LOGICAL_ICBI"},   \
+   {0x50, "H_LOGICAL_DCBF"},   \
+   {0x54, "H_GET_TERM_CHAR"},  \
+   {0x58, "H_PUT_TERM_CHAR"},  \
+   {0x5c, "H_REAL_TO_LOGICAL"},\
+   {0x60, "H_HYPERVISOR_DATA"},\
+   {0x64, "H_EOI"},\
+   {0x68, "H_CPPR"},   \
+   {0x6c, "H_IPI"},\
+   {0x70, "H_IPOLL"},  \
+   {0x74, "H_XIRR"},   \
+   {0x78, "H_MIGRATE_DMA"},\
+   {0x7c, "H_PERFMON"},\
+   {0xdc, "H_REGISTER_VPA"},   \
+   {0xe0, "H_CEDE"},   \
+   {0xe4, "H_CONFER"}, \
+   {0xe8, "H_PROD"},   \
+   {0xec, "H_GET_PPP"},\
+   {0xf0, "H_SET_PPP"

[PATCH 12/16] perf kvm/{x86, s390}: Remove dependency on uapi/kvm_perf.h

2016-01-29 Thread Arnaldo Carvalho de Melo
From: Hemant Kumar 

Its better to remove the dependency on uapi/kvm_perf.h to allow dynamic
discovery of kvm events (if its needed). To do this, some extern
variables have been introduced with which we can keep the generic
functions generic.

Signed-off-by: Hemant Kumar 
Acked-by: Alexander Yarygin 
Acked-by: David Ahern 
Cc: Michael Ellerman 
Cc: Naveen N. Rao 
Cc: Paul Mackerras 
Cc: Scott  Wood 
Cc: Srikar Dronamraju 
Cc: linuxppc-dev@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1453962787-15376-1-git-send-email-hem...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/s390/util/kvm-stat.c |  8 +++-
 tools/perf/arch/x86/util/kvm-stat.c  | 14 +++---
 tools/perf/builtin-kvm.c | 20 ++--
 tools/perf/util/kvm-stat.h   |  5 +
 4 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/tools/perf/arch/s390/util/kvm-stat.c 
b/tools/perf/arch/s390/util/kvm-stat.c
index a5dbc07ec9dc..b85a94b19c25 100644
--- a/tools/perf/arch/s390/util/kvm-stat.c
+++ b/tools/perf/arch/s390/util/kvm-stat.c
@@ -10,7 +10,7 @@
  */
 
 #include "../../util/kvm-stat.h"
-#include 
+#include 
 
 define_exit_reasons_table(sie_exit_reasons, sie_intercept_code);
 define_exit_reasons_table(sie_icpt_insn_codes, icpt_insn_codes);
@@ -18,6 +18,12 @@ define_exit_reasons_table(sie_sigp_order_codes, 
sigp_order_codes);
 define_exit_reasons_table(sie_diagnose_codes, diagnose_codes);
 define_exit_reasons_table(sie_icpt_prog_codes, icpt_prog_codes);
 
+const char *vcpu_id_str = "id";
+const int decode_str_len = 40;
+const char *kvm_exit_reason = "icptcode";
+const char *kvm_entry_trace = "kvm:kvm_s390_sie_enter";
+const char *kvm_exit_trace = "kvm:kvm_s390_sie_exit";
+
 static void event_icpt_insn_get_key(struct perf_evsel *evsel,
struct perf_sample *sample,
struct event_key *key)
diff --git a/tools/perf/arch/x86/util/kvm-stat.c 
b/tools/perf/arch/x86/util/kvm-stat.c
index 14e4e668fad7..babefda4c862 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -1,5 +1,7 @@
 #include "../../util/kvm-stat.h"
-#include 
+#include 
+#include 
+#include 
 
 define_exit_reasons_table(vmx_exit_reasons, VMX_EXIT_REASONS);
 define_exit_reasons_table(svm_exit_reasons, SVM_EXIT_REASONS);
@@ -11,6 +13,12 @@ static struct kvm_events_ops exit_events = {
.name = "VM-EXIT"
 };
 
+const char *vcpu_id_str = "vcpu_id";
+const int decode_str_len = 20;
+const char *kvm_exit_reason = "exit_reason";
+const char *kvm_entry_trace = "kvm:kvm_entry";
+const char *kvm_exit_trace = "kvm:kvm_exit";
+
 /*
  * For the mmio events, we treat:
  * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
@@ -65,7 +73,7 @@ static void mmio_event_decode_key(struct perf_kvm_stat *kvm 
__maybe_unused,
  struct event_key *key,
  char *decode)
 {
-   scnprintf(decode, DECODE_STR_LEN, "%#lx:%s",
+   scnprintf(decode, decode_str_len, "%#lx:%s",
  (unsigned long)key->key,
  key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
 }
@@ -109,7 +117,7 @@ static void ioport_event_decode_key(struct perf_kvm_stat 
*kvm __maybe_unused,
struct event_key *key,
char *decode)
 {
-   scnprintf(decode, DECODE_STR_LEN, "%#llx:%s",
+   scnprintf(decode, decode_str_len, "%#llx:%s",
  (unsigned long long)key->key,
  key->info ? "POUT" : "PIN");
 }
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 4418d9214872..ab5645cf39d2 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -30,7 +30,6 @@
 #include 
 
 #ifdef HAVE_KVM_STAT_SUPPORT
-#include 
 #include "util/kvm-stat.h"
 
 void exit_event_get_key(struct perf_evsel *evsel,
@@ -38,12 +37,12 @@ void exit_event_get_key(struct perf_evsel *evsel,
struct event_key *key)
 {
key->info = 0;
-   key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON);
+   key->key = perf_evsel__intval(evsel, sample, kvm_exit_reason);
 }
 
 bool kvm_exit_event(struct perf_evsel *evsel)
 {
-   return !strcmp(evsel->name, KVM_EXIT_TRACE);
+   return !strcmp(evsel->name, kvm_exit_trace);
 }
 
 bool exit_event_begin(struct perf_evsel *evsel,
@@ -59,7 +58,7 @@ bool exit_event_begin(struct perf_evsel *evsel,
 
 bool kvm_entry_event(struct perf_evsel *evsel)
 {
-   return !strcmp(evsel->name, KVM_ENTRY_TRACE);
+   return !strcmp(evsel->name, kvm_entry_trace);
 }
 
 bool exit_event_end(struct perf_evsel *evsel,
@@ -91,7 +90,7 @@ void exit_event_decode_key(struct perf_kvm_stat *kvm,
const char *exit_reason = get_exit_reason(kvm, key->exit_reasons,
  key->key);
 
-   scnprintf(decode, DECODE_STR_LEN, "%s", exit_reason);
+ 

[PATCH 13/16] perf kvm/{x86,s390}: Remove const from kvm_events_tp

2016-01-29 Thread Arnaldo Carvalho de Melo
See http://www.infradead.org/rpr.html

From: Hemant Kumar 

This patch removes the "const" qualifier from kvm_events_tp declaration
to account for the fact that some architectures may need to update this
variable dynamically. For instance, powerpc will need to update this
variable dynamically depending on the machine type.

Signed-off-by: Hemant Kumar 
Acked-by: David Ahern 
Cc: Alexander Yarygin 
Cc: Michael Ellerman 
Cc: Naveen N. Rao 
Cc: Paul Mackerras 
Cc: Scott  Wood 
Cc: Srikar Dronamraju 
Cc: linuxppc-dev@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1453962787-15376-2-git-send-email-hem...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/s390/util/kvm-stat.c | 2 +-
 tools/perf/arch/x86/util/kvm-stat.c  | 2 +-
 tools/perf/util/kvm-stat.h   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/s390/util/kvm-stat.c 
b/tools/perf/arch/s390/util/kvm-stat.c
index b85a94b19c25..ed57df2e6d68 100644
--- a/tools/perf/arch/s390/util/kvm-stat.c
+++ b/tools/perf/arch/s390/util/kvm-stat.c
@@ -79,7 +79,7 @@ static struct kvm_events_ops exit_events = {
.name = "VM-EXIT"
 };
 
-const char * const kvm_events_tp[] = {
+const char *kvm_events_tp[] = {
"kvm:kvm_s390_sie_enter",
"kvm:kvm_s390_sie_exit",
"kvm:kvm_s390_intercept_instruction",
diff --git a/tools/perf/arch/x86/util/kvm-stat.c 
b/tools/perf/arch/x86/util/kvm-stat.c
index babefda4c862..b63d4be655a2 100644
--- a/tools/perf/arch/x86/util/kvm-stat.c
+++ b/tools/perf/arch/x86/util/kvm-stat.c
@@ -129,7 +129,7 @@ static struct kvm_events_ops ioport_events = {
.name = "IO Port Access"
 };
 
-const char * const kvm_events_tp[] = {
+const char *kvm_events_tp[] = {
"kvm:kvm_entry",
"kvm:kvm_exit",
"kvm:kvm_mmio",
diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h
index dd55548ef66a..c965dc844df3 100644
--- a/tools/perf/util/kvm-stat.h
+++ b/tools/perf/util/kvm-stat.h
@@ -133,7 +133,7 @@ bool kvm_entry_event(struct perf_evsel *evsel);
  */
 int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid);
 
-extern const char * const kvm_events_tp[];
+extern const char *kvm_events_tp[];
 extern struct kvm_reg_events_ops kvm_reg_events_ops[];
 extern const char * const kvm_skip_events[];
 extern const char *vcpu_id_str;
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 14/16] perf kvm/powerpc: Port perf kvm stat to powerpc

2016-01-29 Thread Arnaldo Carvalho de Melo
See http://www.infradead.org/rpr.html

From: Hemant Kumar 

perf kvm can be used to analyze guest exit reasons. This support already
exists in x86. Hence, porting it to powerpc.

 - To trace KVM events :
  perf kvm stat record
  If many guests are running, we can track for a specific guest by using
  --pid as in : perf kvm stat record --pid 

 - To see the results :
  perf kvm stat report

The result shows the number of exits (from the guest context to
host/hypervisor context) grouped by their respective exit reasons with
their frequency.

Since, different powerpc machines have different KVM tracepoints, this
patch discovers the available tracepoints dynamically and accordingly
looks for them. If any single tracepoint is not present, this support
won't be enabled for reporting. To record, this will fail if any of the
events we are looking to record isn't available.  Right now, its only
supported on PowerPC Book3S_HV architectures.

To analyze the different exits, group them and present them (in a slight
descriptive way) to the user, we need a mapping between the "exit code"
(dumped in the kvm_guest_exit tracepoint data) and to its related
Interrupt vector description (exit reason). This patch adds this mapping
in book3s_hv_exits.h.

It records on two available KVM tracepoints for book3s_hv:

"kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter".

Here is a sample o/p:
 # pgrep qemu
19378
60515

2 Guests are running on the host.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624
samples) ]

 # perf kvm stat report -p 60515

Analyze events for pid(s) 60515, all VCPUs:

 VM-EXIT Samples Samples% Time% MinTimeMaxTime  Avg time

   SYSCALL  9141  63.67%  7.49% 1.26us   5782.39us9.87us (+- 6.46%)
H_DATA_STORAGE  4114  28.66%  5.07% 1.72us   4597.68us   14.84us (+-20.06%)
HV_DECREMENTER   418   2.91%  4.26% 0.70us  30002.22us  122.58us (+-70.29%)
  EXTERNAL   392   2.73%  0.06% 0.64us104.10us1.94us (+-18.83%)
RETURN_TO_HOST   287   2.00% 83.11% 1.53us 124240.15us 3486.52us (+-16.81%)
H_INST_STORAGE 5   0.03%  0.00% 1.88us  3.73us2.39us (+-14.20%)

Total Samples:14357, Total events handled time:1203918.42us.

Signed-off-by: Hemant Kumar 
Cc: Alexander Yarygin 
Cc: David Ahern 
Cc: Michael Ellerman 
Cc: Naveen N. Rao 
Cc: Paul Mackerras 
Cc: Scott  Wood 
Cc: Srikar Dronamraju 
Cc: linuxppc-dev@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1453962787-15376-3-git-send-email-hem...@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju 
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/powerpc/Makefile   |   2 +
 tools/perf/arch/powerpc/util/Build |   1 +
 tools/perf/arch/powerpc/util/book3s_hv_exits.h |  33 
 tools/perf/arch/powerpc/util/kvm-stat.c| 107 +
 tools/perf/builtin-kvm.c   |  18 +
 tools/perf/util/kvm-stat.h |   1 +
 6 files changed, 162 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h
 create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 7fbca175099e..9f9cea3478fd 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -1,3 +1,5 @@
 ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 endif
+
+HAVE_KVM_STAT_SUPPORT := 1
diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index 7b8b0d1a1b62..c8fe2074d217 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += sym-handling.o
+libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/book3s_hv_exits.h 
b/tools/perf/arch/powerpc/util/book3s_hv_exits.h
new file mode 100644
index ..e68ba2da8970
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/book3s_hv_exits.h
@@ -0,0 +1,33 @@
+#ifndef ARCH_PERF_BOOK3S_HV_EXITS_H
+#define ARCH_PERF_BOOK3S_HV_EXITS_H
+
+/*
+ * PowerPC Interrupt vectors : exit code to name mapping
+ */
+
+#define kvm_trace_symbol_exit \
+   {0x0,   "RETURN_TO_HOST"}, \
+   {0x100, "SYSTEM_RESET"}, \
+   {0x200, "MACHINE_CHECK"}, \
+   {0x300, "DATA_STORAGE"}, \
+   {0x380, "DATA_SEGMENT"}, \
+   {0x400, "INST_STORAGE"}, \
+   {0x480, "INST_SEGMENT"}, \
+   {0x500, "EXTERNAL"}, \
+   {0x501, "EXTERNAL_LEVEL"}, \
+   {0x502, "EXTERNAL_HV"}, \
+   {0x600, "ALIGNMENT"}, \
+   {0x700, "PROGRAM"}, \
+   {0x800, "FP_UNAVAIL"}, \
+   {0x900, "DECREMENTER"}, \
+   {0x980, "HV_DECREMENTER"}, \
+   {0xc00, "SYSCALL"}, \
+   {0xd00, "TRACE"}, \
+   {0xe00, "H_DATA_STORAGE"}, \
+   {0xe20, "H_INST_STORAGE"}, \
+   {0xe40, "H_EMUL_ASSIST"}, \
+   {0xf00, "PE

[GIT PULL 00/16] perf/core improvements and fixes

2016-01-29 Thread Arnaldo Carvalho de Melo
See http://www.infradead.org/rpr.html

Hi Ingo,

This is on top of the previously submitted perf-core-for-mingo tag,
please consider applying,

- Arnaldo

The following changes since commit 5ac76283b32b116c58e362e99542182ddcfc8262:

  perf cpumap: Auto initialize cpu__max_{node,cpu} (2016-01-26 16:08:36 -0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-2

for you to fetch changes up to 814568db641f6587c1e98a3a85f214cb6a30fe10:

  perf build: Align the names of the build tests: (2016-01-29 17:51:04 -0300)


New features:

- Port 'perf kvm stat' to PowerPC (Hemant Kumar)

Infrastructure:

- Use the 'feature-dump' target to do the feature checks just once and then
  add code to reuse that in the tests/make makefile, speeding up the
  'make -C tools/perf build-test' target (Wang Nan)

- Reduce the number of tests the 'build-test' target do to those that don't
  pollute the source tree (Arnaldo Carvalho de Melo)

- Improve the output of the build tests a bit by aligning the name of the
  tests, more can be done to filter out uninteresting info in the output
  (Arnaldo Carvalho de Melo)

- Add perf_evlist pointer to *info_priv_size(), more prep work for
  supporting the coresight architecture (Mathieu Poirier)

- Improve the 'perf test bp_signal' test (Wang Nan)

- Check environment before starting the BPF 'perf test', so that we can just
  'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)

- Fix cpumode of synthesized buildid event (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo 


Arnaldo Carvalho de Melo (2):
  perf tools: Speed up build-tests by reducing the number of builds tested
  perf build: Align the names of the build tests:

Hemant Kumar (4):
  perf kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h
  perf kvm/{x86,s390}: Remove const from kvm_events_tp
  perf kvm/powerpc: Port perf kvm stat to powerpc
  perf kvm/powerpc: Add support for HCALL reasons

Jiri Olsa (1):
  perf build: Fix feature-dump checks, we need to test all features

Mathieu Poirier (1):
  perf auxtrace: Add perf_evlist pointer to *info_priv_size()

Wang Nan (8):
  tools build: Check basic headers for test-compile feature checker
  perf build: Remove all condition feature check {C,LD}FLAGS
  perf build: Use feature dump file for build-test
  perf buildid: Fix cpumode of buildid event
  perf test: Check environment before start real BPF test
  perf test: Improve bp_signal
  perf tools: Move timestamp creation to util
  perf record: Use OPT_BOOLEAN_SET for buildid cache related options

 tools/build/Makefile.feature   |   8 ++
 tools/build/feature/test-compile.c |   2 +
 tools/perf/Makefile|  11 +-
 tools/perf/arch/powerpc/Makefile   |   2 +
 tools/perf/arch/powerpc/util/Build |   1 +
 tools/perf/arch/powerpc/util/book3s_hcalls.h   | 123 ++
 tools/perf/arch/powerpc/util/book3s_hv_exits.h |  33 +
 tools/perf/arch/powerpc/util/kvm-stat.c| 170 +
 tools/perf/arch/s390/util/kvm-stat.c   |  10 +-
 tools/perf/arch/x86/util/intel-bts.c   |   4 +-
 tools/perf/arch/x86/util/intel-pt.c|   4 +-
 tools/perf/arch/x86/util/kvm-stat.c|  16 ++-
 tools/perf/builtin-buildid-cache.c |  14 +-
 tools/perf/builtin-kvm.c   |  38 --
 tools/perf/builtin-record.c|  12 +-
 tools/perf/config/Makefile | 101 +++
 tools/perf/tests/bp_signal.c   | 140 
 tools/perf/tests/bpf.c |  37 ++
 tools/perf/tests/make  |  39 +-
 tools/perf/util/auxtrace.c |   7 +-
 tools/perf/util/auxtrace.h |   6 +-
 tools/perf/util/build-id.c |   6 +-
 tools/perf/util/kvm-stat.h |   8 +-
 tools/perf/util/util.c |  17 +++
 tools/perf/util/util.h |   1 +
 25 files changed, 688 insertions(+), 122 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h
 create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported

2016-01-29 Thread Alex Williamson


- Original Message -
> On 2016/1/29 6:46, Alex Williamson wrote:
> > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> >> MSI-X tables are not allowed to be mmapped in vfio-pci
> >> driver in case that user get to touch this directly.
> >> This will cause some performance issues when when PCI
> >> adapters have critical registers in the same page as
> >> the MSI-X table.
> >>   
> >> However, some kind of PCI host bridge such as IODA bridge
> >> on Power support filtering of MSIs, which can ensure that a
> >> given pci device can only shoot the MSIs assigned for it.
> >> So we think it's safe to expose the MSI-X table to userspace
> >> if filtering of MSIs is supported because the exposed MSI-X
> >> table can't be used to do harm to other memory space.
> >>   
> >> To support this case, this patch adds a pci_host_bridge
> >> attribute to indicate if this PCI host bridge supports
> >> filtering of MSIs.
> >>   
> >> Signed-off-by: Yongji Xie 
> >> ---
> >>   drivers/pci/host-bridge.c |6 ++
> >>   include/linux/pci.h   |3 +++
> >>   2 files changed, 9 insertions(+)
> >>   
> >> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> >> index 5f4a2e0..c029267 100644
> >> --- a/drivers/pci/host-bridge.c
> >> +++ b/drivers/pci/host-bridge.c
> >> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct
> >> resource *res,
> >>res->end = region->end + offset;
> >>   }
> >>   EXPORT_SYMBOL(pcibios_bus_to_resource);
> >> +
> >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
> >> +{
> >> +  return pci_find_host_bridge(pdev->bus)->msi_filtered;
> >> +}
> >> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index b640d65..b952b78 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -412,6 +412,7 @@ struct pci_host_bridge {
> >>void (*release_fn)(struct pci_host_bridge *);
> >>void *release_data;
> >>unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
> >> +  unsigned int msi_filtered:1;/* support filtering of MSIs */
> >>/* Resource alignment requirements */
> >>resource_size_t (*align_resource)(struct pci_dev *dev,
> >>const struct resource *res,
> >> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct
> >> pci_host_bridge *bridge,
> >>   
> >>   int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
> >>   
> >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
> >> +
> >>   /*
> >>* The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that
> >>correspond
> >>* to P2P or CardBus bridge windows) go in a table.  Additional ones
> >>(for
> > Don't we already have a flag for this in the IOMMU space?
> >
> > enum iommu_cap {
> >  IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache
> >  coherent DMA
> > transactions */
> > --->IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt
> > isolation */
> >  IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
> > };
> >
> 
> I saw this flag had been enabled in x86 and ARM arch.
> 
> I'm not sure whether we can mmap MSI-X table in those archs. I just
> verify it on PPC64 arch.

Unfortunately that's not a very good excuse for creating an alternate 
implementation.  When x86 implements interrupt remapping, we get fine grained 
isolation of MSI vectors and we've always taken this flag to mean that the 
system is isolated from devices that may perform DoS attacks with MSI writes.  
I'm not entirely sure whether ARM really provides that degree of isolation, but 
they would be incorrect is exposing the capability if they do not.  Thanks,

Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned

2016-01-29 Thread Alex Williamson
On Fri, 2016-01-29 at 18:37 +0800, Yongji Xie wrote:
> On 2016/1/29 6:46, Alex Williamson wrote:
> > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> > > When vfio passthrough a PCI device of which MMIO BARs
> > > are smaller than PAGE_SIZE, guest will not handle the
> > > mmio accesses to the BARs which leads to mmio emulations
> > > in host.
> > >   
> > > This is because vfio will not allow to passthrough one
> > > BAR's mmio page which may be shared with other BARs.
> > >   
> > > To solve this performance issue, this patch adds a kernel
> > > parameter "pci=resource_page_aligned=on" to enforce
> > > the alignment of all MMIO BARs to be at least PAGE_SIZE,
> > > so that one BAR's mmio page would not be shared with other
> > > BARs. We can also disable it through kernel parameter
> > > "pci=resource_page_aligned=off".
> > >   
> > > For the default value of the parameter, we think it should be
> > > arch-independent, so we add a macro
> > > HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
> > > define this macro to enable this parameter by default on PPC64
> > > platform which can easily hit this performance issue because
> > > its PAGE_SIZE is 64KB.
> > >   
> > > Note that the kernel parameter won't works if kernel doesn't do
> > > resources reallocation.
> > And where do you account for this so that we know whether it's really in
> > effect?
> 
> We can check the flag PCI_PROBE_ONLY to know whether kernel do
> resources reallocation. Then we know if the kernel parameter is really
> in effect.
> 
> enum {
>  /* Force re-assigning all resources (ignore firmware
>   * setup completely)
>   */
>  PCI_REASSIGN_ALL_RSRC= 0x0001,
> 
>  /* Re-assign all bus numbers */
>  PCI_REASSIGN_ALL_BUS= 0x0002,
> 
>  /* Do not try to assign, just use existing setup */
> --->PCI_PROBE_ONLY= 0x0004,
> 
> And I will add this to commit log.

We need more than a commit log entry for this, what's the purpose of the
pci_resources_share_page() function if we don't know if this is in
effect?

> > > Signed-off-by: Yongji Xie 
> > > ---
> > >   Documentation/kernel-parameters.txt |5 +
> > >   arch/powerpc/include/asm/pci.h  |   11 +++
> > >   drivers/pci/pci.c   |   35 
> > > +++
> > >   drivers/pci/pci.h   |8 +++-
> > >   include/linux/pci.h |4 
> > >   5 files changed, 62 insertions(+), 1 deletion(-)
> > >   
> > > diff --git a/Documentation/kernel-parameters.txt 
> > > b/Documentation/kernel-parameters.txt
> > > index 742f69d..3f2a7c9 100644
> > > --- a/Documentation/kernel-parameters.txt
> > > +++ b/Documentation/kernel-parameters.txt
> > > @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also 
> > > be entirely omitted.
> > >   PAGE_SIZE is used as alignment.
> > >   PCI-PCI bridge can be specified, if 
> > > resource
> > >   windows need to be expanded.
> > > + resource_page_aligned=  Enable/disable enforcing the alignment
> > > + of all PCI devices' memory resources to be
> > > + at least PAGE_SIZE if resources reallocation
> > > + is done by kernel.
> > > + Format: { "on" | "off" }
> > >   ecrc=   Enable/disable PCIe ECRC (transaction 
> > > layer
> > >   end-to-end CRC checking).
> > >   bios: Use BIOS/firmware settings. This 
> > > is the
> > > diff --git a/arch/powerpc/include/asm/pci.h 
> > > b/arch/powerpc/include/asm/pci.h
> > > index 3453bd8..2d2b3ef 100644
> > > --- a/arch/powerpc/include/asm/pci.h
> > > +++ b/arch/powerpc/include/asm/pci.h
> > > @@ -136,6 +136,17 @@ extern pgprot_t  pci_phys_mem_access_prot(struct 
> > > file *file,
> > >    unsigned long pfn,
> > >    unsigned long size,
> > >    pgprot_t prot);
> > > +#ifdef CONFIG_PPC64
> > > +
> > > +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
> > > + * by default. This would be helpful to improve performance
> > > + * when we passthrough a PCI device of which BARs are smaller
> > > + * than PAGE_SIZE(64KB). And we can use kernel parameter
> > > + * "pci=resource_page_aligned=off" to disable it.
> > > + */
> > > +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED  1
> > > +
> > > +#endif
> > >   
> > >   #define HAVE_ARCH_PCI_RESOURCE_TO_USER
> > >   extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 314db8c..7b21238 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
> > >    */
> > 

[PATCH] powerpc/book3s_32: Fix build error with checkpoint restart

2016-01-29 Thread Aneesh Kumar K.V
In file included from mm/vmscan.c:54:0:
include/linux/swapops.h: In function ‘pte_to_swp_entry’:
include/linux/swapops.h:69:2: error: implicit declaration of function 
‘pte_swp_soft_dirty’ [-Werror=implicit-function-declaration]
  if (pte_swp_soft_dirty(pte))
  ^
include/linux/swapops.h:70:3: error: implicit declaration of function 
‘pte_swp_clear_soft_dirty’ [-Werror=implicit-function-declaration]
   pte = pte_swp_clear_soft_dirty(pte);

We support soft dirty tracking only with book3s 64 for now.
So change the Kconfig dependency accordingly. Also CHECKPOINT_RESTORE
feature is not really dependent on SOFT_DIRTY. We track the dependency
between MEM_SOFT_DIRTY and ARCH_SOFT_DIRTY through headers

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 09b94174d372..599329332613 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -558,7 +558,7 @@ choice
 
 config PPC_4K_PAGES
bool "4k page size"
-   select HAVE_ARCH_SOFT_DIRTY if CHECKPOINT_RESTORE && PPC_BOOK3S
+   select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64
 
 config PPC_16K_PAGES
bool "16k page size"
@@ -567,7 +567,7 @@ config PPC_16K_PAGES
 config PPC_64K_PAGES
bool "64k page size"
depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64)
-   select HAVE_ARCH_SOFT_DIRTY if CHECKPOINT_RESTORE && PPC_BOOK3S
+   select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64
 
 config PPC_256K_PAGES
bool "256k page size"
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

2016-01-29 Thread Maciej W. Rozycki
On Thu, 28 Jan 2016, Leonid Yegoshin wrote:

> In http://patchwork.linux-mips.org/patch/10505/ the very last mesg exchange
> is:
[...]
> ... and that stops forever...

 Thanks for the reminder -- last June was very hectic, I travelled a lot 
and I lost the discussion from my radar.  Apologies for that.  I replied 
in that thread now with my results.  I hope this helps.

  Maciej
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree of bsc9132qds

2016-01-29 Thread Zhiqiang Hou


> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: 2016年1月27日 22:24
> To: Zhiqiang Hou ; Zhiqiang Hou
> ; linuxppc-dev@lists.ozlabs.org;
> ga...@kernel.crashing.org; b...@kernel.crashing.org; pau...@samba.org;
> m...@ellerman.id.au; devicet...@vger.kernel.org; robh...@kernel.org;
> pawel.m...@arm.com; mark.rutl...@arm.com; ijc+devicet...@hellion.org.uk;
> Harninder Rai ; r...@kernel.org
> Cc: Lian M.H. ; Hu Vincent
> ; Hou Zhiqiang 
> Subject: Re: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree of
> bsc9132qds
> 
> On Wed, 2016-01-27 at 06:47 +, Zhiqiang Hou wrote:
> > Hi Herring and Kumar and Ian,
> >
> > Can you help to apply this patch?
> >
> > Thanks,
> > Zhiqiang
> 
> Can you check whether a patch has already been applied before pinging people
> about it?
> 

Sorry, I only checked the state of this patchset on web.

Thanks,
Zhiqiang

> 
> >
> > > -Original Message-
> > > From: Zhiqiang Hou [mailto:zhiqiang@nxp.com]
> > > Sent: 2015年12月22日 17:28
> > > To: Zhiqiang Hou ;
> > > linuxppc-dev@lists.ozlabs.org; Scott Wood ;
> > > ga...@kernel.crashing.org; b...@kernel.crashing.org;
> > > pau...@samba.org; m...@ellerman.id.au; devicet...@vger.kernel.org;
> > > robh...@kernel.org; pawel.m...@arm.com; mark.rutl...@arm.com;
> > > ijc+devicet...@hellion.org.uk; Harninder Rai
> > > 
> > > Cc: Lian M.H. ; Hu Vincent
> > > ; Hou Zhiqiang 
> > > Subject: RE: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree
> > > of bsc9132qds
> > >
> > > Hi Rob,
> > >
> > > Could you please take this patch into account?
> > >
> > > Thanks,
> > > Zhiqiang
> > >
> > > > -Original Message-
> > > > From: Zhiqiang Hou [mailto:zhiqiang@freescale.com]
> > > > Sent: 2015年11月5日 11:16
> > > > To: linuxppc-dev@lists.ozlabs.org; Scott Wood;
> > > > ga...@kernel.crashing.org; b...@kernel.crashing.org;
> > > > pau...@samba.org; m...@ellerman.id.au; devicet...@vger.kernel.org;
> > > > robh...@kernel.org; pawel.m...@arm.com; mark.rutl...@arm.com;
> > > > ijc+devicet...@hellion.org.uk; Harninder Rai
> > > > Cc: Minghuan Lian; Mingkai Hu; Zhiqiang Hou
> > > > Subject: [PATCH V4 1/2] powerpc/fsl: Add PCI node in device tree
> > > > of bsc9132qds
> > > >
> > > > From: Harninder Rai 
> > > >
> > > > Signed-off-by: Harninder Rai 
> > > > Signed-off-by: Minghuan Lian 
> > > > Signed-off-by: Hou Zhiqiang 
> > > > ---
> > > > V4: V3:
> > > >  - Remove gerrit stuff.
> > > > V2:
> > > >  - Remove property clock-frequency.
> > > >
> > > >  arch/powerpc/boot/dts/bsc9132qds.dts  | 15 ++
> > > >  arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 28
> > > > +++
> > > > +++ arch/powerpc/boot/dts/fsl/bsc9132si-pr
> > > > +++ e.dt
> > > > +++ si
> > > > >  1 +
> > > >  3 files changed, 44 insertions(+)
> > > >
> > > > diff --git a/arch/powerpc/boot/dts/bsc9132qds.dts
> > > > b/arch/powerpc/boot/dts/bsc9132qds.dts
> > > > index 6cab106..940d719 100644
> > > > --- a/arch/powerpc/boot/dts/bsc9132qds.dts
> > > > +++ b/arch/powerpc/boot/dts/bsc9132qds.dts
> > > > @@ -29,6 +29,21 @@
> > > >  soc: soc@ff70 {
> > > >  ranges = <0x0 0x0 0xff70 0x10>;  };
> > > > +
> > > > +pci0: pcie@ff70a000 {
> > > > +reg = <0 0xff70a000 0 0x1000>;
> > > > +ranges = <0x200 0x0 0x9000 0 0x9000 0x0
> > > > 0x2000
> > > > +  0x100 0x0 0x 0 0xc001 0x0 0x1>;
> > > > +pcie@0 {
> > > > +ranges = <0x200 0x0 0x9000
> > > > +  0x200 0x0 0x9000
> > > > +  0x0 0x2000
> > > > +
> > > > +  0x100 0x0 0x0
> > > > +  0x100 0x0 0x0
> > > > +  0x0 0x10>;
> > > > +};
> > > > +};
> > > >  };
> > > >
> > > >  /include/ "bsc9132qds.dtsi"
> > > > diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > > > b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > > > index c723071..b5f0715 100644
> > > > --- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > > > +++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > > > @@ -40,6 +40,34 @@
> > > >  interrupts = <16 2 0 0 20 2 0 0>;  };
> > > >
> > > > +/* controller at 0xa000 */
> > > > +&pci0 {
> > > > +compatible = "fsl,bsc9132-pcie", "fsl,qoriq-pcie-v2.2";
> > > > +device_type = "pci"; #size-cells = <2>; #address-cells = <3>;
> > > > +bus-range = <0 255>; interrupts = <16 2 0 0>;
> > > > +
> > > > +pcie@0 {
> > > > +reg = <0 0 0 0 0>;
> > > > +#interrupt-cells = <1>;
> > > > +#size-cells = <2>;
> > > > +#address-cells = <3>;
> > > > +device_type = "pci";
> > > > +interrupts = <16 2 0 0>;
> > > > +interrupt-map-mask = <0xf800 0 0 7>;
> > > > +
> > > > +interrupt-map = <
> > > > +/* IDSEL 0x0 */
> > > > + 0x0 0x0 0x1 &mpic 0x0 0x2 0x0 0x0
> > > > + 0x0 0x0 0x2 &mpic 0x1 0x2 0x0 0x0
> > > > + 0x0 0x0 0x3 &mpic 0x2 0x2 0x0 0x0
> > > > + 0x0 0x0 0x4 &mpic 0x3 0x2 0x0 0x0
> > > > +>;
> > > > +};
> > > > +};
> > > > +
> > > >  &soc {
> > > >  #address-cells = <1>;
> > > >  #size-cells 

Re: [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs

2016-01-29 Thread Yongji Xie

On 2016/1/29 6:46, Alex Williamson wrote:

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap MSI-X
table in case that user get to touch this directly.
  
But we should allow to mmap these MSI-X tables if the PCI

host bridge supports filtering of MSIs.
  
Signed-off-by: Yongji Xie 

---
  drivers/vfio/pci/vfio_pci.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
  
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c

index 11fd0f0..4d68f6a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data,
IORESOURCE_MEM && !pci_resources_share_page(pdev,
info.index)) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
-   if (info.index == vdev->msix_bar) {
+   if (!pci_host_bridge_msi_filtered_enabled(pdev) 
&&
+   info.index == vdev->msix_bar) {
ret = msix_sparse_mmap_cap(vdev, &caps);
if (ret)
return ret;
@@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
return -EINVAL;
  
-	if (index == vdev->msix_bar) {

+   if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
+   index == vdev->msix_bar) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere

What about read()/write() access, why would we allow mmap() but not
those?



Yes, you are right! I miss the MSI-X table check in vfio_pci_bar_rw().
I will fix it in next version. Thanks.

Regards,
Yongji Xie

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported

2016-01-29 Thread Yongji Xie

On 2016/1/29 6:46, Alex Williamson wrote:

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:

MSI-X tables are not allowed to be mmapped in vfio-pci
driver in case that user get to touch this directly.
This will cause some performance issues when when PCI
adapters have critical registers in the same page as
the MSI-X table.
  
However, some kind of PCI host bridge such as IODA bridge

on Power support filtering of MSIs, which can ensure that a
given pci device can only shoot the MSIs assigned for it.
So we think it's safe to expose the MSI-X table to userspace
if filtering of MSIs is supported because the exposed MSI-X
table can't be used to do harm to other memory space.
  
To support this case, this patch adds a pci_host_bridge

attribute to indicate if this PCI host bridge supports
filtering of MSIs.
  
Signed-off-by: Yongji Xie 

---
  drivers/pci/host-bridge.c |6 ++
  include/linux/pci.h   |3 +++
  2 files changed, 9 insertions(+)
  
diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c

index 5f4a2e0..c029267 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct 
resource *res,
res->end = region->end + offset;
  }
  EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
+{
+   return pci_find_host_bridge(pdev->bus)->msi_filtered;
+}
+EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b640d65..b952b78 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -412,6 +412,7 @@ struct pci_host_bridge {
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
+   unsigned int msi_filtered:1;/* support filtering of MSIs */
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,
@@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge 
*bridge,
  
  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
  
+bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);

+
  /*
   * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
   * to P2P or CardBus bridge windows) go in a table.  Additional ones (for

Don't we already have a flag for this in the IOMMU space?

enum iommu_cap {
 IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
transactions */
--->IOMMU_CAP_INTR_REMAP,   /* IOMMU supports interrupt isolation */
 IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
};



I saw this flag had been enabled in x86 and ARM arch.

I'm not sure whether we can mmap MSI-X table in those archs. I just 
verify it on PPC64 arch.


Regards.
Yongji Xie

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned

2016-01-29 Thread Yongji Xie

On 2016/1/29 6:46, Alex Williamson wrote:

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:

When vfio passthrough a PCI device of which MMIO BARs
are smaller than PAGE_SIZE, guest will not handle the
mmio accesses to the BARs which leads to mmio emulations
in host.
  
This is because vfio will not allow to passthrough one

BAR's mmio page which may be shared with other BARs.
  
To solve this performance issue, this patch adds a kernel

parameter "pci=resource_page_aligned=on" to enforce
the alignment of all MMIO BARs to be at least PAGE_SIZE,
so that one BAR's mmio page would not be shared with other
BARs. We can also disable it through kernel parameter
"pci=resource_page_aligned=off".
  
For the default value of the parameter, we think it should be

arch-independent, so we add a macro
HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
define this macro to enable this parameter by default on PPC64
platform which can easily hit this performance issue because
its PAGE_SIZE is 64KB.
  
Note that the kernel parameter won't works if kernel doesn't do

resources reallocation.

And where do you account for this so that we know whether it's really in
effect?


We can check the flag PCI_PROBE_ONLY to know whether kernel do
resources reallocation. Then we know if the kernel parameter is really
in effect.

enum {
/* Force re-assigning all resources (ignore firmware
 * setup completely)
 */
PCI_REASSIGN_ALL_RSRC= 0x0001,

/* Re-assign all bus numbers */
PCI_REASSIGN_ALL_BUS= 0x0002,

/* Do not try to assign, just use existing setup */
--->PCI_PROBE_ONLY= 0x0004,

And I will add this to commit log.


Signed-off-by: Yongji Xie 
---
  Documentation/kernel-parameters.txt |5 +
  arch/powerpc/include/asm/pci.h  |   11 +++
  drivers/pci/pci.c   |   35 +++
  drivers/pci/pci.h   |8 +++-
  include/linux/pci.h |4 
  5 files changed, 62 insertions(+), 1 deletion(-)
  
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt

index 742f69d..3f2a7c9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+   resource_page_aligned=  Enable/disable enforcing the alignment
+   of all PCI devices' memory resources to be
+   at least PAGE_SIZE if resources reallocation
+   is done by kernel.
+   Format: { "on" | "off" }
ecrc=   Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 3453bd8..2d2b3ef 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -136,6 +136,17 @@ extern pgprot_tpci_phys_mem_access_prot(struct file 
*file,
 unsigned long pfn,
 unsigned long size,
 pgprot_t prot);
+#ifdef CONFIG_PPC64
+
+/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
+ * by default. This would be helpful to improve performance
+ * when we passthrough a PCI device of which BARs are smaller
+ * than PAGE_SIZE(64KB). And we can use kernel parameter
+ * "pci=resource_page_aligned=off" to disable it.
+ */
+#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED1
+
+#endif
  
  #define HAVE_ARCH_PCI_RESOURCE_TO_USER

  extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 314db8c..7b21238 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -99,6 +99,9 @@ u8 pci_cache_line_size;
   */
  unsigned int pcibios_max_latency = 255;
  
+bool pci_resources_page_aligned =

+   IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);

I don't think this is proper use of IS_ENABLED, which seems to be
targeted at CONFIG_ type options.  You could define this as that in an
arch Kconfig.


Is it better that we define this as a pci Kconfig and select it in arch 
Kconfig?



+
  /* If set, the PCIe ARI capability will not be used. */
  static bool pcie_ari_disabled;
  
@@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,

  BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
pci_resource_alignment_store);
  
+static void pci_resources_get_page_aligned(

[GIT PULL] Please pull powerpc/linux.git powerpc-4.5-2 tag

2016-01-29 Thread Michael Ellerman
Hi Linus,

Please pull powerpc fixes for 4.5:

The following changes since commit 9fa686068a32ddf256df03982b3e3967c18654a8:

  Merge tag 'dmaengine-fix-4.5-rc1' of 
git://git.infradead.org/users/vkoul/slave-dma (2016-01-20 10:15:21 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.5-2

for you to fetch changes up to 2d19fc639516dc7b4184450b315c931d38549e61:

  powerpc/mm: Fixup _HPAGE_CHG_MASK (2016-01-28 23:49:43 +1100)


powerpc fixes for 4.5

 - Wire up copy_file_range() syscall from Chandan Rajendra
 - Simplify module TOC handling from Alan Modra
 - Remove newly added extra definition of pmd_dirty from Stephen Rothwell
 - Allow user space to map rtas_rmo_buf from Vasant Hegde
 - Fix PE location code from Gavin Shan
 - Remove PPMU_HAS_SSLOT flag for Power8 from Madhavan Srinivasan
 - Fixup _HPAGE_CHG_MASK from Aneesh Kumar K.V


Alan Modra (1):
  powerpc: Simplify module TOC handling

Aneesh Kumar K.V (1):
  powerpc/mm: Fixup _HPAGE_CHG_MASK

Chandan Rajendra (1):
  powerpc: Wire up copy_file_range() syscall

Gavin Shan (1):
  powerpc/eeh: Fix PE location code

Madhavan Srinivasan (1):
  powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8

Stephen Rothwell (1):
  powerpc: Remove newly added extra definition of pmd_dirty

Vasant Hegde (1):
  powerpc/mm: Allow user space to map rtas_rmo_buf

 arch/powerpc/include/asm/book3s/64/hash.h|  4 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
 arch/powerpc/include/asm/systbl.h|  1 +
 arch/powerpc/include/asm/unistd.h|  2 +-
 arch/powerpc/include/uapi/asm/unistd.h   |  1 +
 arch/powerpc/kernel/eeh_pe.c | 33 +---
 arch/powerpc/kernel/misc_64.S| 28 ---
 arch/powerpc/kernel/module_64.c  | 12 +++---
 arch/powerpc/mm/mem.c|  4 ++--
 arch/powerpc/perf/power8-pmu.c   |  2 +-
 scripts/mod/modpost.c|  3 ++-
 11 files changed, 35 insertions(+), 56 deletions(-)


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] powerpc/mm: Enable HugeTLB page migration

2016-01-29 Thread Anshuman Khandual
On 01/28/2016 08:14 PM, Aneesh Kumar K.V wrote:
> Anshuman Khandual  writes:
> 
>> This enables HugeTLB page migration for PPC64_BOOK3S systems which implement
>> HugeTLB page at the PMD level. It enables the kernel configuration option
>> CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION by default which turns on the function
>> hugepage_migration_supported() during migration. After the recent changes
>> to the PTE format, HugeTLB page migration happens successfully.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/powerpc/Kconfig | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index e4824fd..65d52a0 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -82,6 +82,10 @@ config GENERIC_HWEIGHT
>>  config ARCH_HAS_DMA_SET_COHERENT_MASK
>>  bool
>>
>> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
>> +def_bool y
>> +depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
>> +
>>  config PPC
>>  bool
>>  default y
> 
> 
> Are you sure this is all that is needed ? We will get a FOLL_GET with hugetlb
> migration and our follow_huge_addr will BUG_ON on that. Look at
> e66f17ff71772b209eed39de35aaa99ba819c93d (" mm/hugetlb: take page table
> lock in follow_huge_pmd()").

HugeTLB page migration was successful without any error and data integrity
check passed on them as well. But yes there might be some corner cases which
trigger the race condition we have not faced yet. Will try to understand the
situation there and get back.

> 
> Again this doesn't work with 4K page size. So if you are taking this
> route, we will need that restriction here.
> 

Agreed, I had already put a comment on the thread pointing out the same.
But yes, the restriction needs to be there in the enabling config option
here as well.

> I would suggest we switch 64K page size hugetlb to generic
> hugetlb and then do hugetlb migration on top of that.

Will explore it and get back.

> 
> Till you help me understnd why that FOLL_GET issue is not valid for
> powerpc,

Sure will get back.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev