Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Qian Cai



> On Jan 28, 2020, at 1:13 AM, Christophe Leroy  wrote:
> 
> ppc32 an indecent / legacy platform ? Are you kidying ?
> 
> Powerquicc II PRO for instance is fully supported by the manufacturer and 
> widely used in many small networking devices.

Of course I forgot about embedded devices. The problem is that how many 
developers are actually going to run this debug option on embedded devices?

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Qian Cai



> On Jan 28, 2020, at 2:03 AM, Anshuman Khandual  
> wrote:
> 
> 'allyesconfig' makes 'DEBUG_VM = y' which in turn will enable 
> 'DEBUG_VM_PGTABLE = y'
> on platforms that subscribe ARCH_HAS_DEBUG_VM_PGTABLE.

Isn’t that only for compiling testing? Who is booting such a beast and make 
sure everything working as expected?

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Anshuman Khandual



On 01/28/2020 12:06 PM, Qian Cai wrote:
> 
> 
>> On Jan 28, 2020, at 1:17 AM, Christophe Leroy  
>> wrote:
>>
>> It is 'default y' so there is no much risk that it is forgotten, at least 
>> all test suites run with 'allyes_defconfig' will trigger the test, so I 
>> think it is really a good feature.
> 
> This thing depends on DEBUG_VM which I don’t see it is selected by any 
> defconfig. Am I missing anything?
> 

'allyesconfig' makes 'DEBUG_VM = y' which in turn will enable 'DEBUG_VM_PGTABLE 
= y'
on platforms that subscribe ARCH_HAS_DEBUG_VM_PGTABLE.


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Qian Cai



> On Jan 28, 2020, at 1:17 AM, Christophe Leroy  wrote:
> 
> It is 'default y' so there is no much risk that it is forgotten, at least all 
> test suites run with 'allyes_defconfig' will trigger the test, so I think it 
> is really a good feature.

This thing depends on DEBUG_VM which I don’t see it is selected by any 
defconfig. Am I missing anything?

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Christophe Leroy




Le 28/01/2020 à 06:48, Qian Cai a écrit :




On Jan 27, 2020, at 11:58 PM, Anshuman Khandual  
wrote:

As I had mentioned before, the test attempts to formalize page table helper 
semantics
as expected from generic MM code paths and intend to catch deviations when 
enabled on
a given platform. How else should we test semantics errors otherwise ? There 
are past
examples of usefulness for this procedure on arm64 and on s390. I am wondering 
how
else to prove the usefulness of a debug feature if these references are not 
enough.


Not saying it will not be useful. As you mentioned it actually found a bug or 
two in the past. The problem is that there is always a cost to maintain 
something like this, and nobody knew how things could be broken even for the 
isolated code you mentioned in the future given how complicated the kernel code 
base is. I am not so positive that many developers would enable this debug 
feature and use it on a regular basis from the information you gave so far.

On the other hand, it might just be good at maintaining this thing out of tree 
by yourself anyway, because if there isn’t going to be used by many developers, 
few people is going to contribute to this and even noticed when it is broken. 
What’s the point of getting this merged apart from being getting some 
meaningless credits?



It is 'default y' so there is no much risk that it is forgotten, at 
least all test suites run with 'allyes_defconfig' will trigger the test, 
so I think it is really a good feature.


Christophe


[PATCH v6 10/10] drivers/oprofile: open access for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to monitoring for CAP_PERFMON privileged process. Providing
the access under CAP_PERFMON capability singly, without the rest of
CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_PERFMON capability.

Signed-off-by: Alexey Budankov 
---
 drivers/oprofile/event_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index 12ea4a4ad607..6c9edc8bbc95 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct 
file *file)
 {
int err = -EPERM;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EPERM;
 
if (test_and_set_bit_lock(0, &buffer_opened))
-- 
2.20.1




[PATCH v6 09/10] drivers/perf: open access for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_PERFMON capability.

Signed-off-by: Alexey Budankov 
---
 drivers/perf/arm_spe_pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..5dff81bc3324 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
if (!attr->exclude_kernel)
reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
 
-   if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
+   if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
 
return reg;
@@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
 
reg = arm_spe_event_to_pmscr(event);
-   if (!capable(CAP_SYS_ADMIN) &&
+   if (!perfmon_capable() &&
(reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
BIT(SYS_PMSCR_EL1_CX_SHIFT) |
BIT(SYS_PMSCR_EL1_PCT_SHIFT
-- 
2.20.1




Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Christophe Leroy




Le 28/01/2020 à 04:33, Qian Cai a écrit :




On Jan 27, 2020, at 10:06 PM, Anshuman Khandual  
wrote:



On 01/28/2020 07:41 AM, Qian Cai wrote:




On Jan 27, 2020, at 8:28 PM, Anshuman Khandual  
wrote:

This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
right after page_alloc_init_late().

This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
arm64. Going forward, other architectures too can enable this after fixing
build or runtime problems (if any) with their page table helpers.


Hello Qian,



What’s the value of this block of new code? It only supports x86 and arm64
which are supposed to be good now.


We have been over the usefulness of this code many times before as the patch is
already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
ppc32. There are build time or runtime problems with other archs which prevent


I am not sure if I care too much about arc and ppc32 which are pretty much 
legacy
platforms.


enablement of this test (for the moment) but then the goal is to integrate all
of them going forward. The test not only validates platform's adherence to the
expected semantics from generic MM but also helps in keeping it that way during
code changes in future as well.


Another option maybe to get some decent arches on board first before merging 
this
thing, so it have more changes to catch regressions for developers who might 
run this.



ppc32 an indecent / legacy platform ? Are you kidying ?

Powerquicc II PRO for instance is fully supported by the manufacturer 
and widely used in many small networking devices.


Christophe


[PATCH v6 08/10] parisc/perf: open access for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_PERFMON capability.

Signed-off-by: Alexey Budankov 
---
 arch/parisc/kernel/perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index 676683641d00..c4208d027794 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char 
__user *buf,
else
return -EFAULT;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EACCES;
 
if (count != sizeof(uint32_t))
-- 
2.20.1




[PATCH v6 07/10] powerpc/perf: open access for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to the monitoring remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
monitoring is discouraged with respect to CAP_PERFMON capability.

Signed-off-by: Alexey Budankov 
---
 arch/powerpc/perf/imc-pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d..e837717492e4 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EACCES;
 
/* Sampling not supported */
@@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EACCES;
 
/* Return if this is a couting event */
-- 
2.20.1




[PATCH v6 06/10] trace/bpf_trace: open access for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to bpf_trace monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to bpf_trace monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON
capability.

Signed-off-by: Alexey Budankov 
---
 kernel/trace/bpf_trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e5ef4ae9edb5..334f1d71ebb1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1395,7 +1395,7 @@ int perf_event_query_prog_array(struct perf_event *event, 
void __user *info)
u32 *ids, prog_cnt, ids_len;
int ret;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EPERM;
if (event->attr.type != PERF_TYPE_TRACEPOINT)
return -EINVAL;
-- 
2.20.1




[PATCH v6 05/10] drm/i915/perf: open access for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to i915_perf monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to i915_events subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure i915_events monitoring is discouraged with respect to CAP_PERFMON
capability.

Signed-off-by: Alexey Budankov 
---
 drivers/gpu/drm/i915/i915_perf.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2ae14bc14931..d89347861b7d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3375,10 +3375,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 * we check a dev.i915.perf_stream_paranoid sysctl option
 * to determine if it's ok to access system wide OA counters
-* without CAP_SYS_ADMIN privileges.
+* without CAP_PERFMON or CAP_SYS_ADMIN privileges.
 */
if (privileged_op &&
-   i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+   i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -3571,9 +3571,8 @@ static int read_properties_unlocked(struct i915_perf 
*perf,
} else
oa_freq_hz = 0;
 
-   if (oa_freq_hz > i915_oa_max_sample_rate &&
-   !capable(CAP_SYS_ADMIN)) {
-   DRM_DEBUG("OA exponent would exceed the max 
sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root 
privileges\n",
+   if (oa_freq_hz > i915_oa_max_sample_rate && 
!perfmon_capable()) {
+   DRM_DEBUG("OA exponent would exceed the max 
sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without 
CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
  i915_oa_max_sample_rate);
return -EACCES;
}
@@ -3994,7 +3993,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, 
void *data,
return -EINVAL;
}
 
-   if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+   if (i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
return -EACCES;
}
@@ -4141,7 +4140,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, 
void *data,
return -ENOTSUPP;
}
 
-   if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+   if (i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
return -EACCES;
}
-- 
2.20.1




[PATCH v6 04/10] perf tool: extend Perf tool with CAP_PERFMON capability support

2020-01-27 Thread Alexey Budankov


Extend error messages to mention CAP_PERFMON capability as an option
to substitute CAP_SYS_ADMIN capability for secure system performance
monitoring and observability operations. Make perf_event_paranoid_check()
and __cmd_ftrace() to be aware of CAP_PERFMON capability.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

For backward compatibility reasons access to perf_events subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure perf_events monitoring is discouraged with respect to CAP_PERFMON
capability.

Signed-off-by: Alexey Budankov 
---
 tools/perf/builtin-ftrace.c |  5 +++--
 tools/perf/design.txt   |  3 ++-
 tools/perf/util/cap.h   |  4 
 tools/perf/util/evsel.c | 10 +-
 tools/perf/util/util.c  |  1 +
 5 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index d5adc417a4ca..55eda54240fb 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -284,10 +284,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
.events = POLLIN,
};
 
-   if (!perf_cap__capable(CAP_SYS_ADMIN)) {
+   if (!(perf_cap__capable(CAP_PERFMON) ||
+ perf_cap__capable(CAP_SYS_ADMIN))) {
pr_err("ftrace only works for %s!\n",
 #ifdef HAVE_LIBCAP_SUPPORT
-   "users with the SYS_ADMIN capability"
+   "users with the CAP_PERFMON or CAP_SYS_ADMIN capability"
 #else
"root"
 #endif
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index 0453ba26cdbd..a42fab308ff6 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -258,7 +258,8 @@ gets schedule to. Per task counters can be created by any 
user, for
 their own tasks.
 
 A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
-all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
+all events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN
+privilege.
 
 The 'flags' parameter is currently unused and must be zero.
 
diff --git a/tools/perf/util/cap.h b/tools/perf/util/cap.h
index 051dc590ceee..ae52878c0b2e 100644
--- a/tools/perf/util/cap.h
+++ b/tools/perf/util/cap.h
@@ -29,4 +29,8 @@ static inline bool perf_cap__capable(int cap __maybe_unused)
 #define CAP_SYSLOG 34
 #endif
 
+#ifndef CAP_PERFMON
+#define CAP_PERFMON38
+#endif
+
 #endif /* __PERF_CAP_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a69e64236120..a35f17723dd3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2491,14 +2491,14 @@ int perf_evsel__open_strerror(struct evsel *evsel, 
struct target *target,
 "You may not have permission to collect %sstats.\n\n"
 "Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n"
 "which controls use of the performance events system by\n"
-"unprivileged users (without CAP_SYS_ADMIN).\n\n"
+"unprivileged users (without CAP_PERFMON or 
CAP_SYS_ADMIN).\n\n"
 "The current value is %d:\n\n"
 "  -1: Allow use of (almost) all events by all users\n"
 "  Ignore mlock limit after perf_event_mlock_kb without 
CAP_IPC_LOCK\n"
-">= 0: Disallow ftrace function tracepoint by users without 
CAP_SYS_ADMIN\n"
-"  Disallow raw tracepoint access by users without 
CAP_SYS_ADMIN\n"
-">= 1: Disallow CPU event access by users without 
CAP_SYS_ADMIN\n"
-">= 2: Disallow kernel profiling by users without 
CAP_SYS_ADMIN\n\n"
+">= 0: Disallow ftrace function tracepoint by users without 
CAP_PERFMON or CAP_SYS_ADMIN\n"
+"  Disallow raw tracepoint access by users without 
CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+">= 1: Disallow CPU event access by users without CAP_PERFMON 
or CAP_SYS_ADMIN\n"
+">= 2: Disallow kernel profiling by users without CAP_PERFMON 
or CAP_SYS_ADMIN\n\n"
 "To make this setting permanent, edit /etc/sysctl.conf too, 
e.g.:\n\n"
 "  kernel.perf_event_paranoid = -1\n" ,
 target->system_wide ? "system-wide " : "",
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 969ae560dad9..51cf3071db74 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -272,6 +272,7 @@ int perf_event_paranoid(void)
 bool perf_event_paranoid_check(int max_level)
 {
return perf_cap__capable(CAP_SYS_ADMIN) ||

[PATCH v6 03/10] perf/core: open access to probes for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to monitoring via kprobes and uprobes and eBPF tracing for
CAP_PERFMON privileged process. Providing the access under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes operation more secure.

perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses
ftrace to define new kprobe events, and those events are treated as
tracepoint events. eBPF defines new probes via perf_event_open interface
and then the probes are used in eBPF tracing.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process or
program be granted only those privileges (e.g., capabilities) necessary to
accomplish its legitimate function, and only for the time that such privileges
are actually required)

For backward compatibility reasons access to perf_events subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure perf_events monitoring is discouraged with respect to CAP_PERFMON
capability.

Signed-off-by: Alexey Budankov 
---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d956c81bd310..c6453320ffea 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9088,7 +9088,7 @@ static int perf_kprobe_event_init(struct perf_event 
*event)
if (event->attr.type != perf_kprobe.type)
return -ENOENT;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EACCES;
 
/*
@@ -9148,7 +9148,7 @@ static int perf_uprobe_event_init(struct perf_event 
*event)
if (event->attr.type != perf_uprobe.type)
return -ENOENT;
 
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EACCES;
 
/*
-- 
2.20.1




[PATCH v6 02/10] perf/core: open access to the core for CAP_PERFMON privileged process

2020-01-27 Thread Alexey Budankov


Open access to monitoring of kernel code, cpus, tracepoints and namespaces
data for a CAP_PERFMON privileged process. Providing the access under
CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials,
excludes chances to misuse the credentials and makes operation more secure.

CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
of least privilege: A security design principle that states that a process or
program be granted only those privileges (e.g., capabilities) necessary to
accomplish its legitimate function, and only for the time that such privileges
are actually required)

For backward compatibility reasons access to perf_events subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
perf_events monitoring is discouraged with respect to CAP_PERFMON capability.

Signed-off-by: Alexey Budankov 
---
 include/linux/perf_event.h | 6 +++---
 kernel/events/core.c   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..730469babcc2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)
 
 static inline int perf_allow_kernel(struct perf_event_attr *attr)
 {
-   if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
+   if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
return -EACCES;
 
return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
@@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct 
perf_event_attr *attr)
 
 static inline int perf_allow_cpu(struct perf_event_attr *attr)
 {
-   if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
+   if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
return -EACCES;
 
return security_perf_event_open(attr, PERF_SECURITY_CPU);
@@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr 
*attr)
 
 static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
 {
-   if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
+   if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
return -EPERM;
 
return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2173c23c25b4..d956c81bd310 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11186,7 +11186,7 @@ SYSCALL_DEFINE5(perf_event_open,
}
 
if (attr.namespaces) {
-   if (!capable(CAP_SYS_ADMIN))
+   if (!perfmon_capable())
return -EACCES;
}
 
-- 
2.20.1




[PATCH v6 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

2020-01-27 Thread Alexey Budankov


Introduce CAP_PERFMON capability designed to secure system performance
monitoring and observability operations so that CAP_PERFMON would assist
CAP_SYS_ADMIN capability in its governing role for performance monitoring
and observability subsystems.

CAP_PERFMON hardens system security and integrity during system performance
monitoring and observability operations by decreasing attack surface that
is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes the operation more secure.
Thus, CAP_PERFMON implements the principal of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle
of least privilege: A security design principle that states that a process
or program be granted only those privileges (e.g., capabilities) necessary
to accomplish its legitimate function, and only for the time that such
privileges are actually required)

CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to
mass users of a system, and securely unblocks applicability and scalability
of system performance monitoring and observability operations beyond root
and CAP_SYS_ADMIN process use cases.

CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
monitoring and observability operations and balances amount of CAP_SYS_ADMIN
credentials following the recommendations in the capabilities man page [1]
for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
usage for secure system performance monitoring and observability operations
is discouraged with respect to the designed CAP_PERFMON capability.

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official embargoed hardware issues mitigation procedure [2].
The bugs in the software itself can be fixed following the standard kernel
development process [3] to maintain and harden security of system performance
monitoring and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] 
https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov 
---
 include/linux/capability.h  | 4 
 include/uapi/linux/capability.h | 8 +++-
 security/selinux/include/classmap.h | 4 ++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..027d7e4a853b 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,10 @@ extern bool privileged_wrt_inode_uidgid(struct 
user_namespace *ns, const struct
 extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
 extern bool file_ns_capable(const struct file *file, struct user_namespace 
*ns, int cap);
 extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace 
*ns);
+static inline bool perfmon_capable(void)
+{
+   return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
+}
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..8b416e5f3afa 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -366,8 +366,14 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_READ 37
 
+/*
+ * Allow system performance and observability privileged operations
+ * using perf_events, i915_perf and other kernel subsystems
+ */
+
+#define CAP_PERFMON38
 
-#define CAP_LAST_CAP CAP_AUDIT_READ
+#define CAP_LAST_CAP CAP_PERFMON
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 7db24855e12d..c599b0c2b0e7 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@
"audit_control", "setfcap"
 
 #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
-   "wake_alarm", "block_suspend", "audit_read"
+   "wake_alarm", "block_suspend", 

[PATCH v6 00/10] Introduce CAP_PERFMON to secure system performance monitoring and observability

2020-01-27 Thread Alexey Budankov


Currently access to perf_events, i915_perf and other performance monitoring and
observability subsystems of the kernel is open only for a privileged process [1]
with CAP_SYS_ADMIN capability enabled in the process effective set [2].

This patch set introduces CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON would
assist CAP_SYS_ADMIN capability in its governing role for performance monitoring
and observability subsystems of the kernel.

CAP_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack surface
that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the 
access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes 
chances
to misuse the credentials and makes the operation more secure. Thus, CAP_PERFMON
implements the principal of least privilege for performance monitoring and
observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle of least
privilege: A security design principle that states that a process or program be
granted only those privileges (e.g., capabilities) necessary to accomplish its
legitimate function, and only for the time that such privileges are actually
required)

CAP_PERFMON intends to meet the demand to secure system performance monitoring
and observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to mass
users of a system, and securely unblock applicability and scalability of system
performance monitoring and observability operations beyond root or CAP_SYS_ADMIN
process use cases.

CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system
performance monitoring and observability operations and balance amount of
CAP_SYS_ADMIN credentials following the recommendations in the capabilities man
page [2] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to
kernel developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains open
for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability usage for
secure system performance monitoring and observability operations is discouraged
with respect to the designed CAP_PERFMON capability.

Possible alternative solution to this system security hardening, capabilities
balancing task of making performance monitoring and observability operations
more accessible could be to use the existing CAP_SYS_PTRACE capability to govern
system performance monitoring and observability subsystems. However 
CAP_SYS_PTRACE
capability still provides users with more credentials than are required for 
secure
performance monitoring and observability operations and this excess is avoided 
by
the designed CAP_PERFMON capability.

Although software running under CAP_PERFMON can not ensure avoidance of related
hardware issues, the software can still mitigate those issues following the 
official
embargoed hardware issues mitigation procedure [3]. The bugs in the software 
itself
can be fixed following the standard kernel development process [4] to maintain 
and
harden security of system performance monitoring and observability operations.
Finally, the patch set is shaped in the way that simplifies backtracking 
procedure
of possible induced issues [5] as much as possible.

The patch set is for tip perf/core repository:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
sha1: 56ee04aa63285d6bc8a995a26e2441ae3d419bcd

---
Changes in v6:
- avoided noaudit checks in perfmon_capable() to explicitly advertise 
CAP_PERFMON
  usage thru audit logs to secure system performance monitoring and 
observability
Changes in v5:
- renamed CAP_SYS_PERFMON to CAP_PERFMON
- extended perfmon_capable() with noaudit checks
Changes in v4:
- converted perfmon_capable() into an inline function
- made perf_events kprobes, uprobes, hw breakpoints and namespaces data 
available
  to CAP_SYS_PERFMON privileged processes
- applied perfmon_capable() to drivers/perf and drivers/oprofile
- extended __cmd_ftrace() with support of CAP_SYS_PERFMON
Changes in v3:
- implemented perfmon_capable() macros aggregating required capabilities checks
Changes in v2:
- made perf_events trace points available to CAP_SYS_PERFMON privileged 
processes
- made perf_event_paranoid_check() treat CAP_SYS_PERFMON equally to 
CAP_SYS_ADMIN
- applied CAP_SYS_PERFMON to i915_perf, bpf_trace, powerpc and parisc system
  performance monitoring and observability related subsystems

---
Alexey Budankov (10):
  capabilities: introduce CAP_PERFMON to kernel and user space
  perf/core: open access to the core for CAP_PERFMON privileged process
  perf/

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Qian Cai



> On Jan 27, 2020, at 11:58 PM, Anshuman Khandual  
> wrote:
> 
> As I had mentioned before, the test attempts to formalize page table helper 
> semantics
> as expected from generic MM code paths and intend to catch deviations when 
> enabled on
> a given platform. How else should we test semantics errors otherwise ? There 
> are past
> examples of usefulness for this procedure on arm64 and on s390. I am 
> wondering how
> else to prove the usefulness of a debug feature if these references are not 
> enough.

Not saying it will not be useful. As you mentioned it actually found a bug or 
two in the past. The problem is that there is always a cost to maintain 
something like this, and nobody knew how things could be broken even for the 
isolated code you mentioned in the future given how complicated the kernel code 
base is. I am not so positive that many developers would enable this debug 
feature and use it on a regular basis from the information you gave so far. 

On the other hand, it might just be good at maintaining this thing out of tree 
by yourself anyway, because if there isn’t going to be used by many developers, 
few people is going to contribute to this and even noticed when it is broken. 
What’s the point of getting this merged apart from being getting some 
meaningless credits?

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Anshuman Khandual



On 01/28/2020 09:03 AM, Qian Cai wrote:
> 
> 
>> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual  
>> wrote:
>>
>>
>>
>> On 01/28/2020 07:41 AM, Qian Cai wrote:
>>>
>>>
 On Jan 27, 2020, at 8:28 PM, Anshuman Khandual  
 wrote:

 This adds tests which will validate architecture page table helpers and
 other accessors in their compliance with expected generic MM semantics.
 This will help various architectures in validating changes to existing
 page table helpers or addition of new ones.

 This test covers basic page table entry transformations including but not
 limited to old, young, dirty, clean, write, write protect etc at various
 level along with populating intermediate entries with next page table page
 and validating them.

 Test page table pages are allocated from system memory with required size
 and alignments. The mapped pfns at page table levels are derived from a
 real pfn representing a valid kernel text symbol. This test gets called
 right after page_alloc_init_late().

 This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
 CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
 select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
 arm64. Going forward, other architectures too can enable this after fixing
 build or runtime problems (if any) with their page table helpers.
>>
>> Hello Qian,
>>
>>>
>>> What’s the value of this block of new code? It only supports x86 and arm64
>>> which are supposed to be good now.
>>
>> We have been over the usefulness of this code many times before as the patch 
>> is
>> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc 
>> and
>> ppc32. There are build time or runtime problems with other archs which 
>> prevent
> 
> I am not sure if I care too much about arc and ppc32 which are pretty much 
> legacy
> platforms.

Okay but FWIW the maintainers for all these enabled platforms cared for this 
test
at the least and really helped in shaping the test to it's current state. 
Besides
I am still failing to understand your point here about evaluating particular 
feature's
usefulness based on it's support on relative and perceived importance of some 
platforms
compared to others. Again the idea is to integrate all platforms eventually but 
we had
discovered build and runtime issues which needs to be resolved at platform 
level first.
Unless I am mistaken, debug feature like this which is putting down a framework 
while
also benefiting some initial platforms to start with, will be a potential 
candidate for
eventual inclusion in the mainline. Otherwise, please point to any other agreed 
upon
community criteria for debug feature's mainline inclusion which I will try to 
adhere.
I wonder if all other similar debug features from the past ever met 'the all 
inclusive
at the beginning' criteria which you are trying to propose here. This test also 
adds a
feature file, enlisting all supported archs as suggested by Ingo for the exact 
same
reason. This is not the first time, a feature is listing out archs which are 
supported
and archs which are not.

> 
>> enablement of this test (for the moment) but then the goal is to integrate 
>> all
>> of them going forward. The test not only validates platform's adherence to 
>> the
>> expected semantics from generic MM but also helps in keeping it that way 
>> during
>> code changes in future as well.
> 
> Another option maybe to get some decent arches on board first before merging 
> this
> thing, so it have more changes to catch regressions for developers who might 
> run this. 
> 
>>
>>> Did those tests ever find any regression or this is almost only useful for 
>>> new
>>
>> The test has already found problems with s390 page table helpers.
> 
> Hmm, that is pretty weak where s390 is not even official supported with this 
> version.

And there were valid reasons why s390 could not be enabled just yet as 
explained by s390
folks during our previous discussions. I just pointed out an example where this 
test was
useful as you had asked previously. Not being official supported in this 
version does
not take away the fact the it was indeed useful for that platform in 
discovering a bug.

> 
>>
>>> architectures which only happened once in a few years?
>>
>> Again, not only it validates what exist today but its also a tool to make
>> sure that all platforms continue adhere to a common agreed upon semantics
>> as reflected through the tests here.
>>
>>> The worry if not many people will use this config and code those that much 
>>> in
>>
>> Debug features or tests in the kernel are used when required. These are 
>> never or
>> should not be enabled by default. AFAICT this is true even for entire 
>> DEBUG_VM
>> packaged tests. Do you have any particular data or precedence to substantiate
>> the fact that this test will be used any less often than the other

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Qian Cai



> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual  
> wrote:
> 
> 
> 
> On 01/28/2020 07:41 AM, Qian Cai wrote:
>> 
>> 
>>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual  
>>> wrote:
>>> 
>>> This adds tests which will validate architecture page table helpers and
>>> other accessors in their compliance with expected generic MM semantics.
>>> This will help various architectures in validating changes to existing
>>> page table helpers or addition of new ones.
>>> 
>>> This test covers basic page table entry transformations including but not
>>> limited to old, young, dirty, clean, write, write protect etc at various
>>> level along with populating intermediate entries with next page table page
>>> and validating them.
>>> 
>>> Test page table pages are allocated from system memory with required size
>>> and alignments. The mapped pfns at page table levels are derived from a
>>> real pfn representing a valid kernel text symbol. This test gets called
>>> right after page_alloc_init_late().
>>> 
>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>> arm64. Going forward, other architectures too can enable this after fixing
>>> build or runtime problems (if any) with their page table helpers.
> 
> Hello Qian,
> 
>> 
>> What’s the value of this block of new code? It only supports x86 and arm64
>> which are supposed to be good now.
> 
> We have been over the usefulness of this code many times before as the patch 
> is
> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc 
> and
> ppc32. There are build time or runtime problems with other archs which prevent

I am not sure if I care too much about arc and ppc32 which are pretty much 
legacy
platforms.

> enablement of this test (for the moment) but then the goal is to integrate all
> of them going forward. The test not only validates platform's adherence to the
> expected semantics from generic MM but also helps in keeping it that way 
> during
> code changes in future as well.

Another option maybe to get some decent arches on board first before merging 
this
thing, so it have more changes to catch regressions for developers who might 
run this. 

> 
>> Did those tests ever find any regression or this is almost only useful for 
>> new
> 
> The test has already found problems with s390 page table helpers.

Hmm, that is pretty weak where s390 is not even official supported with this 
version.

> 
>> architectures which only happened once in a few years?
> 
> Again, not only it validates what exist today but its also a tool to make
> sure that all platforms continue adhere to a common agreed upon semantics
> as reflected through the tests here.
> 
>> The worry if not many people will use this config and code those that much in
> 
> Debug features or tests in the kernel are used when required. These are never 
> or
> should not be enabled by default. AFAICT this is true even for entire DEBUG_VM
> packaged tests. Do you have any particular data or precedence to substantiate
> the fact that this test will be used any less often than the other similar 
> ones
> in the tree ? I can only speak for arm64 platform but the very idea for this
> test came from Catalin when we were trying to understand the semantics for THP
> helpers while enabling THP migration without split. Apart from going over the
> commit messages from the past, there were no other way to figure out how any
> particular page table helper is suppose to change given page table entry. This
> test tries to formalize those semantics.

I am thinking about how we made so many mistakes before by merging too many of
those debugging options that many of them have been broken for many releases
proving that nobody actually used them regularly. We don’t need to repeat the 
same
mistake again. I am actually thinking about to remove things like  
page_poisoning often
which is almost are never found any bug recently and only cause pains when 
interacting
with other new features that almost nobody will test them together to begin 
with.
We even have some SLUB debugging code sit there for almost 15 years that almost
nobody used it and maintainers refused to remove it.

> 
>> the future because it is inefficient to find bugs, it will simply be rotten
> Could you be more specific here ? What parts of the test are inefficient ? I
> am happy to improve upon the test. Do let me know you if you have suggestions.
> 
>> like a few other debugging options out there we have in the mainline that
> will be a pain to remove later on.
>> 
> 
> Even though I am not agreeing to your assessment about the usefulness of the
> test without any substantial data backing up the claims, the test case in
> itself is very much compartmentalized, staying clear from generic MM and
> debug_vm_pgtable() is only function executing the test 

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Anshuman Khandual



On 01/28/2020 07:41 AM, Qian Cai wrote:
> 
> 
>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual  
>> wrote:
>>
>> This adds tests which will validate architecture page table helpers and
>> other accessors in their compliance with expected generic MM semantics.
>> This will help various architectures in validating changes to existing
>> page table helpers or addition of new ones.
>>
>> This test covers basic page table entry transformations including but not
>> limited to old, young, dirty, clean, write, write protect etc at various
>> level along with populating intermediate entries with next page table page
>> and validating them.
>>
>> Test page table pages are allocated from system memory with required size
>> and alignments. The mapped pfns at page table levels are derived from a
>> real pfn representing a valid kernel text symbol. This test gets called
>> right after page_alloc_init_late().
>>
>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>> arm64. Going forward, other architectures too can enable this after fixing
>> build or runtime problems (if any) with their page table helpers.

Hello Qian,

> 
> What’s the value of this block of new code? It only supports x86 and arm64
> which are supposed to be good now.

We have been over the usefulness of this code many times before as the patch is
already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
ppc32. There are build time or runtime problems with other archs which prevent
enablement of this test (for the moment) but then the goal is to integrate all
of them going forward. The test not only validates platform's adherence to the
expected semantics from generic MM but also helps in keeping it that way during
code changes in future as well.

> Did those tests ever find any regression or this is almost only useful for new

The test has already found problems with s390 page table helpers.

> architectures which only happened once in a few years?

Again, not only it validates what exist today but its also a tool to make
sure that all platforms continue adhere to a common agreed upon semantics
as reflected through the tests here.

> The worry if not many people will use this config and code those that much in

Debug features or tests in the kernel are used when required. These are never or
should not be enabled by default. AFAICT this is true even for entire DEBUG_VM
packaged tests. Do you have any particular data or precedence to substantiate
the fact that this test will be used any less often than the other similar ones
in the tree ? I can only speak for arm64 platform but the very idea for this
test came from Catalin when we were trying to understand the semantics for THP
helpers while enabling THP migration without split. Apart from going over the
commit messages from the past, there were no other way to figure out how any
particular page table helper is suppose to change given page table entry. This
test tries to formalize those semantics.

> the future because it is inefficient to find bugs, it will simply be rotten
Could you be more specific here ? What parts of the test are inefficient ? I
am happy to improve upon the test. Do let me know you if you have suggestions.

> like a few other debugging options out there we have in the mainline that
will be a pain to remove later on.
>

Even though I am not agreeing to your assessment about the usefulness of the
test without any substantial data backing up the claims, the test case in
itself is very much compartmentalized, staying clear from generic MM and
debug_vm_pgtable() is only function executing the test which is getting
called from kernel_init_freeable() path.

- Anshuman


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Qian Cai



> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual  
> wrote:
> 
> This adds tests which will validate architecture page table helpers and
> other accessors in their compliance with expected generic MM semantics.
> This will help various architectures in validating changes to existing
> page table helpers or addition of new ones.
> 
> This test covers basic page table entry transformations including but not
> limited to old, young, dirty, clean, write, write protect etc at various
> level along with populating intermediate entries with next page table page
> and validating them.
> 
> Test page table pages are allocated from system memory with required size
> and alignments. The mapped pfns at page table levels are derived from a
> real pfn representing a valid kernel text symbol. This test gets called
> right after page_alloc_init_late().
> 
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.

What’s the value of this block of new code? It only supports x86 and arm64 
which are supposed to be good now. Did those tests ever find any regression or 
this is almost only useful for new architectures which only happened once in a 
few years? The worry if not many people will use this config and code those 
that much in the future because it is inefficient to find bugs, it will simply 
be rotten like a few other debugging options out there we have in the mainline 
that will be a pain to remove later on.

[PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-27 Thread Anshuman Khandual
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
right after page_alloc_init_late().

This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
arm64. Going forward, other architectures too can enable this after fixing
build or runtime problems (if any) with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here will
be reported as an warning which would need to be fixed. This test will help
catch any changes to the agreed upon semantics expected from generic MM and
enable platforms to accommodate it thereafter.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Greg Kroah-Hartman 
Cc: Thomas Gleixner 
Cc: Mike Rapoport 
Cc: Jason Gunthorpe 
Cc: Dan Williams 
Cc: Peter Zijlstra 
Cc: Michal Hocko 
Cc: Mark Rutland 
Cc: Mark Brown 
Cc: Steven Price 
Cc: Ard Biesheuvel 
Cc: Masahiro Yamada 
Cc: Kees Cook 
Cc: Tetsuo Handa 
Cc: Matthew Wilcox 
Cc: Sri Krishna chowdary 
Cc: Dave Hansen 
Cc: Russell King - ARM Linux 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: "David S. Miller" 
Cc: Vineet Gupta 
Cc: James Hogan 
Cc: Paul Burton 
Cc: Ralf Baechle 
Cc: Kirill A. Shutemov 
Cc: Gerald Schaefer 
Cc: Christophe Leroy 
Cc: Ingo Molnar 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org

Tested-by: Christophe Leroy#PPC32
Reviewed-by: Ingo Molnar 
Suggested-by: Catalin Marinas 
Signed-off-by: Andrew Morton 
Signed-off-by: Christophe Leroy 
Signed-off-by: Anshuman Khandual 
---
This adds a test validation for architecture exported page table helpers.
Patch adds basic transformation tests at various levels of the page table.

This test was originally suggested by Catalin during arm64 THP migration
RFC discussion earlier. Going forward it can include more specific tests
with respect to various generic MM functions like THP, HugeTLB etc and
platform specific tests.

https://lore.kernel.org/linux-mm/20190628102003.ga56...@arrakis.emea.arm.com/

Needs to be applied on linux V5.5-rc7

Changes in V12:

- Replaced __mmdrop() with mmdrop()
- Enable ARCH_HAS_DEBUG_VM_PGTABLE on X86 for non CONFIG_X86_PAE platforms as 
the
  test procedure interfere with pre-allocated PMDs attached to the PGD resulting
  in runtime failures with VM_BUG_ON()

Changes in V11: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=221135)

- Rebased the patch on V5.4

Changes in V10: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=205529)

- Always enable DEBUG_VM_PGTABLE when DEBUG_VM is enabled per Ingo
- Added tags from Ingo

Changes in V9: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=201429)

- Changed feature support enumeration for powerpc platforms per Christophe
- Changed config wrapper for basic_[pmd|pud]_tests() to enable ARC platform
- Enabled the test on ARC platform

Changes in V8: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=194297)

- Enabled ARCH_HAS_DEBUG_VM_PGTABLE on PPC32 platform per Christophe
- Updated feature documentation as DEBUG_VM_PGTABLE is now enabled on PPC32 
platform
- Moved ARCH_HAS_DEBUG_VM_PGTABLE earlier to indent it with DEBUG_VM per 
Christophe
- Added an information message in debug_vm_pgtable() per Christophe
- Dropped random_vaddr boundary condition checks per Christophe and Qian
- Replaced virt_addr_valid() check with pfn_valid() check in debug_vm_pgtable()
- Slightly changed pr_fmt(fmt) information

Changes in V7: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=193051)

- Memory allocation and free routines for mapped pages have been droped
- Mapped pfns are derived from standard kernel text symbol per Matthew
- Moved debug_vm_pgtaable() after page_alloc_init_late() per Michal and Qian 
- Updated the commit message per

Re: [PATCH] powerpc/64: system call implement the bulk of the logic in C fix

2020-01-27 Thread Nicholas Piggin
Michal Suchánek's on January 28, 2020 4:08 am:
> On Tue, Jan 28, 2020 at 12:17:12AM +1000, Nicholas Piggin wrote:
>> This incremental patch fixes several soft-mask debug and unsafe
>> smp_processor_id messages due to tracing and false positives in
>> "unreconciled" code.
>> 
>> It also fixes a bug with syscall tracing functions that set registers
>> (e.g., PTRACE_SETREG) not setting GPRs properly.
>> 
>> There was a bug reported with the TM selftests, I haven't been able
>> to reproduce that one.
>> 
>> I can squash this into the main patch and resend the series if it
>> helps but the incremental helps to see the bug fixes.
> 
> There are some whitespace differences between this and the series I have
> applied locally. What does it apply to?
> 
> Is there some revision of the patchset I missed?

No I may have just missed some of your whitespace cleanups, or maybe I got
some that Michael made which you don't have in his next-test branch.

Thanks,
Nick


[PATCH] powerpc/64: system call implement the bulk of the logic in C fix 2 (tabort_syscall)

2020-01-27 Thread Nicholas Piggin
Another incremental patch which fixes silly tabort_syscall bug that
causes kernel crashes when making system calls in transactional state.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S   | 9 +++--
 arch/powerpc/kernel/syscall_64.c | 4 ++--
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index d0bb238805e6..94b3db203ec3 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -165,16 +165,13 @@ syscall_restore_regs:
b   .Lsyscall_restore_regs_cont
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-_GLOBAL(tabort_syscall)
+_GLOBAL(tabort_syscall) /* (unsigned long nip, unsigned long msr) */
/* Firstly we need to enable TM in the kernel */
mfmsr   r10
li  r9, 1
rldimi  r10, r9, MSR_TM_LG, 63-MSR_TM_LG
mtmsrd  r10, 0
 
-   ld  r11,_NIP(r13)
-   ld  r12,_MSR(r13)
-
/* tabort, this dooms the transaction, nothing else */
li  r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
TABORT(R9)
@@ -188,8 +185,8 @@ _GLOBAL(tabort_syscall)
li  r9, MSR_RI
andcr10, r10, r9
mtmsrd  r10, 1
-   mtspr   SPRN_SRR0, r11
-   mtspr   SPRN_SRR1, r12
+   mtspr   SPRN_SRR0, r3
+   mtspr   SPRN_SRR1, r4
RFI_TO_USER
b   .   /* prevent speculative execution */
 #endif
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index cfe458adde07..69a4ef13973b 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -15,7 +15,7 @@
 #include 
 #include 
 
-extern void __noreturn tabort_syscall(void);
+extern void __noreturn tabort_syscall(unsigned long nip, unsigned long msr);
 
 typedef long (*syscall_fn)(long, long, long, long, long, long);
 
@@ -30,7 +30,7 @@ notrace long system_call_exception(long r3, long r4, long r5, 
long r6, long r7,
 
if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
unlikely(regs->msr & MSR_TS_T))
-   tabort_syscall();
+   tabort_syscall(regs->nip, regs->msr);
 
account_cpu_user_entry();
 
-- 
2.23.0



[PATCH v2] powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.

2020-01-27 Thread Christophe Leroy
Reorder Linux PTE bits to (almost) match Hash PTE bits.

RW Kernel : PP = 00
RO Kernel : PP = 00
RW User   : PP = 01
RO User   : PP = 11

So naturally, we should have
_PAGE_USER = 0x001
_PAGE_RW   = 0x002

Today 0x001 and 0x002 and _PAGE_PRESENT and _PAGE_HASHPTE which
both are software only bits.

Switch _PAGE_USER and _PAGE_PRESET
Switch _PAGE_RW and _PAGE_HASHPTE

This allows to remove a few insns.

Signed-off-by: Christophe Leroy 
---
v2: rebased on today's powerpc/merge
---
 arch/powerpc/include/asm/book3s/32/hash.h |  8 
 arch/powerpc/kernel/head_32.S |  9 +++--
 arch/powerpc/mm/book3s32/hash_low.S   | 14 ++
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index 2a0a467d2985..34a7215ae81e 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -17,9 +17,9 @@
  * updating the accessed and modified bits in the page table tree.
  */
 
-#define _PAGE_PRESENT  0x001   /* software: pte contains a translation */
-#define _PAGE_HASHPTE  0x002   /* hash_page has made an HPTE for this pte */
-#define _PAGE_USER 0x004   /* usermode access allowed */
+#define _PAGE_USER 0x001   /* usermode access allowed */
+#define _PAGE_RW   0x002   /* software: user write access allowed */
+#define _PAGE_PRESENT  0x004   /* software: pte contains a translation */
 #define _PAGE_GUARDED  0x008   /* G: prohibit speculative access */
 #define _PAGE_COHERENT 0x010   /* M: enforce memory coherence (SMP systems) */
 #define _PAGE_NO_CACHE 0x020   /* I: cache inhibit */
@@ -27,7 +27,7 @@
 #define _PAGE_DIRTY0x080   /* C: page changed */
 #define _PAGE_ACCESSED 0x100   /* R: page referenced */
 #define _PAGE_EXEC 0x200   /* software: exec allowed */
-#define _PAGE_RW   0x400   /* software: user write access allowed */
+#define _PAGE_HASHPTE  0x400   /* hash_page has made an HPTE for this pte */
 #define _PAGE_SPECIAL  0x800   /* software: Special page */
 
 #ifdef CONFIG_PTE_64BIT
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 0493fcac6409..1587a754f061 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -310,7 +310,7 @@ BEGIN_MMU_FTR_SECTION
andis.  r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h
 #endif
bne handle_page_fault_tramp_2   /* if not, try to put a PTE */
-   rlwinm  r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */
+   rlwinm  r3, r5, 32 - 24, 30, 30 /* DSISR_STORE -> _PAGE_RW */
bl  hash_page
b   handle_page_fault_tramp_1
 FTR_SECTION_ELSE
@@ -437,7 +437,6 @@ InstructionTLBMiss:
andc.   r1,r1,r0/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwimi  r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */
ori r1, r1, 0xe06   /* clear out reserved bits */
andcr1, r0, r1  /* PP = user? 1 : 0 */
 BEGIN_FTR_SECTION
@@ -505,9 +504,8 @@ DataLoadTLBMiss:
 * we would need to update the pte atomically with lwarx/stwcx.
 */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwinm  r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */
-   rlwimi  r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */
-   rlwimi  r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */
+   rlwinm  r1,r0,0,30,30   /* _PAGE_RW -> PP msb */
+   rlwimi  r0,r0,1,30,30   /* _PAGE_USER -> PP msb */
ori r1,r1,0xe04 /* clear out reserved bits */
andcr1,r0,r1/* PP = user? rw? 1: 3: 0 */
 BEGIN_FTR_SECTION
@@ -585,7 +583,6 @@ DataStoreTLBMiss:
 * we would need to update the pte atomically with lwarx/stwcx.
 */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwimi  r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */
li  r1,0xe06/* clear out reserved bits & PP msb */
andcr1,r0,r1/* PP = user? 1: 0 */
 BEGIN_FTR_SECTION
diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index c11b0a005196..18fdabde5007 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -41,7 +41,7 @@ mmu_hash_lock:
 /*
  * Load a PTE into the hash table, if possible.
  * The address is in r4, and r3 contains an access flag:
- * _PAGE_RW (0x400) if a write.
+ * _PAGE_RW (0x002) if a write.
  * r9 contains the SRR1 value, from which we use the MSR_PR bit.
  * SPRG_THREAD contains the physical address of the current task's thread.
  *
@@ -78,7 +78,7 @@ _GLOBAL(hash_page)
blt+112f/* assume user more likely */
lis r5, (swap

Re: [PATCH] powerpc/64: system call implement the bulk of the logic in C fix

2020-01-27 Thread Michal Suchánek
On Tue, Jan 28, 2020 at 12:17:12AM +1000, Nicholas Piggin wrote:
> This incremental patch fixes several soft-mask debug and unsafe
> smp_processor_id messages due to tracing and false positives in
> "unreconciled" code.
> 
> It also fixes a bug with syscall tracing functions that set registers
> (e.g., PTRACE_SETREG) not setting GPRs properly.
> 
> There was a bug reported with the TM selftests, I haven't been able
> to reproduce that one.
> 
> I can squash this into the main patch and resend the series if it
> helps but the incremental helps to see the bug fixes.

There are some whitespace differences between this and the series I have
applied locally. What does it apply to?

Is there some revision of the patchset I missed?

Thanks

Michal
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/cputime.h | 39 +-
>  arch/powerpc/kernel/syscall_64.c   | 26 ++--
>  2 files changed, 41 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cputime.h 
> b/arch/powerpc/include/asm/cputime.h
> index c43614cffaac..6639a6847cc0 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const 
> cputime_t ct)
>  #ifdef CONFIG_PPC64
>  #define get_accounting(tsk)  (&get_paca()->accounting)
>  static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> +
> +/*
> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
> + * can't use use get_paca()
> + */
> +static notrace inline void account_cpu_user_entry(void)
> +{
> + unsigned long tb = mftb();
> + struct cpu_accounting_data *acct = &local_paca->accounting;
> +
> + acct->utime += (tb - acct->starttime_user);
> + acct->starttime = tb;
> +}
> +static notrace inline void account_cpu_user_exit(void)
> +{
> + unsigned long tb = mftb();
> + struct cpu_accounting_data *acct = &local_paca->accounting;
> +
> + acct->stime += (tb - acct->starttime);
> + acct->starttime_user = tb;
> +}
> +
>  #else
>  #define get_accounting(tsk)  (&task_thread_info(tsk)->accounting)
>  /*
> @@ -60,23 +82,6 @@ static inline void arch_vtime_task_switch(struct 
> task_struct *prev)
>  }
>  #endif
>  
> -static inline void account_cpu_user_entry(void)
> -{
> - unsigned long tb = mftb();
> - struct cpu_accounting_data *acct = get_accounting(current);
> -
> - acct->utime += (tb - acct->starttime_user);
> - acct->starttime = tb;
> -}
> -static inline void account_cpu_user_exit(void)
> -{
> - unsigned long tb = mftb();
> - struct cpu_accounting_data *acct = get_accounting(current);
> -
> - acct->stime += (tb - acct->starttime);
> - acct->starttime_user = tb;
> -}
> -
>  #endif /* __KERNEL__ */
>  #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  static inline void account_cpu_user_entry(void)
> diff --git a/arch/powerpc/kernel/syscall_64.c 
> b/arch/powerpc/kernel/syscall_64.c
> index 529393a1ff1e..cfe458adde07 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -19,7 +19,8 @@ extern void __noreturn tabort_syscall(void);
>  
>  typedef long (*syscall_fn)(long, long, long, long, long, long);
>  
> -long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
> r8,
> +/* Has to run notrace because it is entered "unreconciled" */
> +notrace long system_call_exception(long r3, long r4, long r5, long r6, long 
> r7, long r8,
>  unsigned long r0, struct pt_regs *regs)
>  {
>   unsigned long ti_flags;
> @@ -36,7 +37,7 @@ long system_call_exception(long r3, long r4, long r5, long 
> r6, long r7, long r8,
>  #ifdef CONFIG_PPC_SPLPAR
>   if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
>   firmware_has_feature(FW_FEATURE_SPLPAR)) {
> - struct lppaca *lp = get_lppaca();
> + struct lppaca *lp = local_paca->lppaca_ptr;
>  
>   if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
>   accumulate_stolen_time();
> @@ -71,13 +72,22 @@ long system_call_exception(long r3, long r4, long r5, 
> long r6, long r7, long r8,
>* We use the return value of do_syscall_trace_enter() as the
>* syscall number. If the syscall was rejected for any reason
>* do_syscall_trace_enter() returns an invalid syscall number
> -  * and the test below against NR_syscalls will fail.
> +  * and the test against NR_syscalls will fail and the return
> +  * value to be used is in regs->gpr[3].
>*/
>   r0 = do_syscall_trace_enter(regs);
> - }
> -
> - if (unlikely(r0 >= NR_syscalls))
> + if (unlikely(r0 >= NR_syscalls))
> + return regs->gpr[3];
> + r3 = regs->gpr[3];
> + r4 = regs->gpr[4];
> + r5 = regs->gpr[5];
> + r6 = regs

Re: [PATCH] of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc

2020-01-27 Thread Frank Rowand


+ Frank (me)

On 1/26/20 5:52 AM, Michael Ellerman wrote:
> There's an OF helper called of_dma_is_coherent(), which checks if a
> device has a "dma-coherent" property to see if the device is coherent
> for DMA.
> 
> But on some platforms devices are coherent by default, and on some
> platforms it's not possible to update existing device trees to add the
> "dma-coherent" property.
> 
> So add a Kconfig symbol to allow arch code to tell
> of_dma_is_coherent() that devices are coherent by default, regardless
> of the presence of the property.
> 
> Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
> when the system has a coherent cache.
> 
> Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper")
> Cc: sta...@vger.kernel.org # v3.16+
> Reported-by: Christian Zigotzky 
> Tested-by: Christian Zigotzky 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/Kconfig | 1 +
>  drivers/of/Kconfig   | 4 
>  drivers/of/address.c | 6 +-
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 1ec34e16ed65..19f5aa8ac9a3 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -238,6 +238,7 @@ config PPC
>   select NEED_DMA_MAP_STATE   if PPC64 || NOT_COHERENT_CACHE
>   select NEED_SG_DMA_LENGTH
>   select OF
> + select OF_DMA_DEFAULT_COHERENT  if !NOT_COHERENT_CACHE
>   select OF_EARLY_FLATTREE
>   select OLD_SIGACTIONif PPC32
>   select OLD_SIGSUSPEND
> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> index 37c2ccbefecd..d91618641be6 100644
> --- a/drivers/of/Kconfig
> +++ b/drivers/of/Kconfig
> @@ -103,4 +103,8 @@ config OF_OVERLAY
>  config OF_NUMA
>   bool
>  
> +config OF_DMA_DEFAULT_COHERENT
> + # arches should select this if DMA is coherent by default for OF devices
> + bool
> +
>  endif # OF
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 99c1b8058559..e8a39c3ec4d4 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -995,12 +995,16 @@ int of_dma_get_range(struct device_node *np, u64 
> *dma_addr, u64 *paddr, u64 *siz
>   * @np:  device node
>   *
>   * It returns true if "dma-coherent" property was found
> - * for this device in DT.
> + * for this device in the DT, or if DMA is coherent by
> + * default for OF devices on the current platform.
>   */
>  bool of_dma_is_coherent(struct device_node *np)
>  {
>   struct device_node *node = of_node_get(np);
>  
> + if (IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT))
> + return true;
> +
>   while (node) {
>   if (of_property_read_bool(node, "dma-coherent")) {
>   of_node_put(node);
> 



Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-27 Thread Dave Hansen
On 1/27/20 2:11 AM, Sandipan Das wrote:
> Hi Dave,
> 
> On 23/01/20 12:15 am, Dave Hansen wrote:
>> Still doesn't build for me:
>>
> I have this patch that hopefully fixes this. My understanding was
> that the vm tests are supposed to be generic but this has quite a
> bit of x86-specific conditional code which complicates things even
> though it is not used by any of the other tests.
> 
> I'm not sure if we should keep x86 multilib build support for these
> selftests but I'll let the maintainers take a call.

How have you tested this patch (and the whole series for that matter)?


[RFC] per-CPU usage in perf core-book3s

2020-01-27 Thread Sebastian Andrzej Siewior
I've been looking at usage of per-CPU variable cpu_hw_events in
arch/powerpc/perf/core-book3s.c.

power_pmu_enable() and power_pmu_disable() (pmu::pmu_enable() and
pmu::pmu_disable()) are accessing the variable and the callbacks are
invoked always with disabled interrupts.

power_pmu_event_init() (pmu::event_init()) is invoked from preemptible
context and uses get_cpu_var() to obtain a stable pointer (by disabling
preemption).

pmu::pmu_enable() and pmu::pmu_disable() can be invoked via a hrtimer
(perf_mux_hrtimer_handler()) and it invokes pmu::pmu_enable() and
pmu::pmu_disable() as part of the callback.

Is there anything that prevents the timer callback to interrupt
pmu::event_init() while it is accessing per-CPU data?

Sebastian


[PATCH] powerpc/64: system call implement the bulk of the logic in C fix

2020-01-27 Thread Nicholas Piggin
This incremental patch fixes several soft-mask debug and unsafe
smp_processor_id messages due to tracing and false positives in
"unreconciled" code.

It also fixes a bug with syscall tracing functions that set registers
(e.g., PTRACE_SETREG) not setting GPRs properly.

There was a bug reported with the TM selftests, I haven't been able
to reproduce that one.

I can squash this into the main patch and resend the series if it
helps but the incremental helps to see the bug fixes.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/cputime.h | 39 +-
 arch/powerpc/kernel/syscall_64.c   | 26 ++--
 2 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index c43614cffaac..6639a6847cc0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t 
ct)
 #ifdef CONFIG_PPC64
 #define get_accounting(tsk)(&get_paca()->accounting)
 static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
+
+/*
+ * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
+ * can't use use get_paca()
+ */
+static notrace inline void account_cpu_user_entry(void)
+{
+   unsigned long tb = mftb();
+   struct cpu_accounting_data *acct = &local_paca->accounting;
+
+   acct->utime += (tb - acct->starttime_user);
+   acct->starttime = tb;
+}
+static notrace inline void account_cpu_user_exit(void)
+{
+   unsigned long tb = mftb();
+   struct cpu_accounting_data *acct = &local_paca->accounting;
+
+   acct->stime += (tb - acct->starttime);
+   acct->starttime_user = tb;
+}
+
 #else
 #define get_accounting(tsk)(&task_thread_info(tsk)->accounting)
 /*
@@ -60,23 +82,6 @@ static inline void arch_vtime_task_switch(struct task_struct 
*prev)
 }
 #endif
 
-static inline void account_cpu_user_entry(void)
-{
-   unsigned long tb = mftb();
-   struct cpu_accounting_data *acct = get_accounting(current);
-
-   acct->utime += (tb - acct->starttime_user);
-   acct->starttime = tb;
-}
-static inline void account_cpu_user_exit(void)
-{
-   unsigned long tb = mftb();
-   struct cpu_accounting_data *acct = get_accounting(current);
-
-   acct->stime += (tb - acct->starttime);
-   acct->starttime_user = tb;
-}
-
 #endif /* __KERNEL__ */
 #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 static inline void account_cpu_user_entry(void)
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 529393a1ff1e..cfe458adde07 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -19,7 +19,8 @@ extern void __noreturn tabort_syscall(void);
 
 typedef long (*syscall_fn)(long, long, long, long, long, long);
 
-long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8,
+/* Has to run notrace because it is entered "unreconciled" */
+notrace long system_call_exception(long r3, long r4, long r5, long r6, long 
r7, long r8,
   unsigned long r0, struct pt_regs *regs)
 {
unsigned long ti_flags;
@@ -36,7 +37,7 @@ long system_call_exception(long r3, long r4, long r5, long 
r6, long r7, long r8,
 #ifdef CONFIG_PPC_SPLPAR
if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   struct lppaca *lp = get_lppaca();
+   struct lppaca *lp = local_paca->lppaca_ptr;
 
if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
accumulate_stolen_time();
@@ -71,13 +72,22 @@ long system_call_exception(long r3, long r4, long r5, long 
r6, long r7, long r8,
 * We use the return value of do_syscall_trace_enter() as the
 * syscall number. If the syscall was rejected for any reason
 * do_syscall_trace_enter() returns an invalid syscall number
-* and the test below against NR_syscalls will fail.
+* and the test against NR_syscalls will fail and the return
+* value to be used is in regs->gpr[3].
 */
r0 = do_syscall_trace_enter(regs);
-   }
-
-   if (unlikely(r0 >= NR_syscalls))
+   if (unlikely(r0 >= NR_syscalls))
+   return regs->gpr[3];
+   r3 = regs->gpr[3];
+   r4 = regs->gpr[4];
+   r5 = regs->gpr[5];
+   r6 = regs->gpr[6];
+   r7 = regs->gpr[7];
+   r8 = regs->gpr[8];
+
+   } else if (unlikely(r0 >= NR_syscalls)) {
return -ENOSYS;
+   }
 
/* May be faster to do array_index_nospec? */
barrier_nospec();
@@ -139,8 +149,10 @@ notrace unsigned long syscall_exit_prepare(unsigned long 
r3,
regs->gpr[3] = r3;
}
 
-   if (unlikely(ti_fl

RE: [PATCH] bus: fsl-mc: add api to retrieve mc version

2020-01-27 Thread Laurentiu Tudor


> -Original Message-
> From: Andrei Botila 
> Sent: Monday, January 27, 2020 1:16 PM
> 
> Add a new api that returns Management Complex firmware version
> and make the required structure public. The api's first user will be
> the caam driver for setting prediction resistance bits.
> 
> Signed-off-by: Andrei Botila 

Acked-by: Laurentiu Tudor 

> ---
>  drivers/bus/fsl-mc/fsl-mc-bus.c | 33 +
>  include/linux/fsl/mc.h  | 16 
>  2 files changed, 33 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-
> bus.c
> index a07cc19becdb..330c76181604 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -26,6 +26,8 @@
>   */
>  #define FSL_MC_DEFAULT_DMA_MASK  (~0ULL)
> 
> +static struct fsl_mc_version mc_version;
> +
>  /**
>   * struct fsl_mc - Private data of a "fsl,qoriq-mc" platform device
>   * @root_mc_bus_dev: fsl-mc device representing the root DPRC
> @@ -54,20 +56,6 @@ struct fsl_mc_addr_translation_range {
>   phys_addr_t start_phys_addr;
>  };
> 
> -/**
> - * struct mc_version
> - * @major: Major version number: incremented on API compatibility changes
> - * @minor: Minor version number: incremented on API additions (that are
> - *   backward compatible); reset when major version is incremented
> - * @revision: Internal revision number: incremented on implementation
> changes
> - *   and/or bug fixes that have no impact on API
> - */
> -struct mc_version {
> - u32 major;
> - u32 minor;
> - u32 revision;
> -};
> -
>  /**
>   * fsl_mc_bus_match - device to driver matching callback
>   * @dev: the fsl-mc device to match against
> @@ -338,7 +326,7 @@ EXPORT_SYMBOL_GPL(fsl_mc_driver_unregister);
>   */
>  static int mc_get_version(struct fsl_mc_io *mc_io,
> u32 cmd_flags,
> -   struct mc_version *mc_ver_info)
> +   struct fsl_mc_version *mc_ver_info)
>  {
>   struct fsl_mc_command cmd = { 0 };
>   struct dpmng_rsp_get_version *rsp_params;
> @@ -363,6 +351,20 @@ static int mc_get_version(struct fsl_mc_io *mc_io,
>   return 0;
>  }
> 
> +/**
> + * fsl_mc_get_version - function to retrieve the MC f/w version
> information
> + *
> + * Return:   mc version when called after fsl-mc-bus probe; NULL otherwise.
> + */
> +struct fsl_mc_version *fsl_mc_get_version(void)
> +{
> + if (mc_version.major)
> + return &mc_version;
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(fsl_mc_get_version);
> +
>  /**
>   * fsl_mc_get_root_dprc - function to traverse to the root dprc
>   */
> @@ -862,7 +864,6 @@ static int fsl_mc_bus_probe(struct platform_device
> *pdev)
>   int container_id;
>   phys_addr_t mc_portal_phys_addr;
>   u32 mc_portal_size;
> - struct mc_version mc_version;
>   struct resource res;
> 
>   mc = devm_kzalloc(&pdev->dev, sizeof(*mc), GFP_KERNEL);
> diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
> index 54d9436600c7..2b5f8366dbe1 100644
> --- a/include/linux/fsl/mc.h
> +++ b/include/linux/fsl/mc.h
> @@ -381,6 +381,22 @@ int __must_check __fsl_mc_driver_register(struct
> fsl_mc_driver *fsl_mc_driver,
> 
>  void fsl_mc_driver_unregister(struct fsl_mc_driver *driver);
> 
> +/**
> + * struct fsl_mc_version
> + * @major: Major version number: incremented on API compatibility changes
> + * @minor: Minor version number: incremented on API additions (that are
> + *   backward compatible); reset when major version is incremented
> + * @revision: Internal revision number: incremented on implementation
> changes
> + *   and/or bug fixes that have no impact on API
> + */
> +struct fsl_mc_version {
> + u32 major;
> + u32 minor;
> + u32 revision;
> +};
> +
> +struct fsl_mc_version *fsl_mc_get_version(void);
> +
>  int __must_check fsl_mc_portal_allocate(struct fsl_mc_device *mc_dev,
>   u16 mc_io_flags,
>   struct fsl_mc_io **new_mc_io);
> --
> 2.17.1



[PATCH] bus: fsl-mc: add api to retrieve mc version

2020-01-27 Thread Andrei Botila
Add a new api that returns Management Complex firmware version
and make the required structure public. The api's first user will be
the caam driver for setting prediction resistance bits.

Signed-off-by: Andrei Botila 
---
 drivers/bus/fsl-mc/fsl-mc-bus.c | 33 +
 include/linux/fsl/mc.h  | 16 
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index a07cc19becdb..330c76181604 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -26,6 +26,8 @@
  */
 #define FSL_MC_DEFAULT_DMA_MASK(~0ULL)
 
+static struct fsl_mc_version mc_version;
+
 /**
  * struct fsl_mc - Private data of a "fsl,qoriq-mc" platform device
  * @root_mc_bus_dev: fsl-mc device representing the root DPRC
@@ -54,20 +56,6 @@ struct fsl_mc_addr_translation_range {
phys_addr_t start_phys_addr;
 };
 
-/**
- * struct mc_version
- * @major: Major version number: incremented on API compatibility changes
- * @minor: Minor version number: incremented on API additions (that are
- * backward compatible); reset when major version is incremented
- * @revision: Internal revision number: incremented on implementation changes
- * and/or bug fixes that have no impact on API
- */
-struct mc_version {
-   u32 major;
-   u32 minor;
-   u32 revision;
-};
-
 /**
  * fsl_mc_bus_match - device to driver matching callback
  * @dev: the fsl-mc device to match against
@@ -338,7 +326,7 @@ EXPORT_SYMBOL_GPL(fsl_mc_driver_unregister);
  */
 static int mc_get_version(struct fsl_mc_io *mc_io,
  u32 cmd_flags,
- struct mc_version *mc_ver_info)
+ struct fsl_mc_version *mc_ver_info)
 {
struct fsl_mc_command cmd = { 0 };
struct dpmng_rsp_get_version *rsp_params;
@@ -363,6 +351,20 @@ static int mc_get_version(struct fsl_mc_io *mc_io,
return 0;
 }
 
+/**
+ * fsl_mc_get_version - function to retrieve the MC f/w version information
+ *
+ * Return: mc version when called after fsl-mc-bus probe; NULL otherwise.
+ */
+struct fsl_mc_version *fsl_mc_get_version(void)
+{
+   if (mc_version.major)
+   return &mc_version;
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(fsl_mc_get_version);
+
 /**
  * fsl_mc_get_root_dprc - function to traverse to the root dprc
  */
@@ -862,7 +864,6 @@ static int fsl_mc_bus_probe(struct platform_device *pdev)
int container_id;
phys_addr_t mc_portal_phys_addr;
u32 mc_portal_size;
-   struct mc_version mc_version;
struct resource res;
 
mc = devm_kzalloc(&pdev->dev, sizeof(*mc), GFP_KERNEL);
diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
index 54d9436600c7..2b5f8366dbe1 100644
--- a/include/linux/fsl/mc.h
+++ b/include/linux/fsl/mc.h
@@ -381,6 +381,22 @@ int __must_check __fsl_mc_driver_register(struct 
fsl_mc_driver *fsl_mc_driver,
 
 void fsl_mc_driver_unregister(struct fsl_mc_driver *driver);
 
+/**
+ * struct fsl_mc_version
+ * @major: Major version number: incremented on API compatibility changes
+ * @minor: Minor version number: incremented on API additions (that are
+ * backward compatible); reset when major version is incremented
+ * @revision: Internal revision number: incremented on implementation changes
+ * and/or bug fixes that have no impact on API
+ */
+struct fsl_mc_version {
+   u32 major;
+   u32 minor;
+   u32 revision;
+};
+
+struct fsl_mc_version *fsl_mc_get_version(void);
+
 int __must_check fsl_mc_portal_allocate(struct fsl_mc_device *mc_dev,
u16 mc_io_flags,
struct fsl_mc_io **new_mc_io);
-- 
2.17.1



[PATCH] powerpc/32s: Fix CPU wake-up from sleep mode

2020-01-27 Thread Christophe Leroy
Commit f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and
rename TI_CPU") broke the CPU wake-up from sleep mode (i.e. when
_TLF_SLEEPING is set) by delaying the tovirt(r2, r2).

This is because r2 is not restored by fast_exception_return. It used
to work (by chance ?) because CPU wake-up interrupt never comes from
user, so r2 is expected to point to 'current' on return.

Commit e2fb9f544431 ("powerpc/32: Prepare for Kernel Userspace Access
Protection") broke it even more by clobbering r0 which is not
restored by fast_exception_return either.

Use r6 instead of r0. This is possible because r3-r6 are restored by
fast_exception_return and only r3-r5 are used for exception arguments.

For r2 it could be converted back to virtual address, but stay on the
safe side and restore it from the stack instead. It should be live
in the cache at that moment, so loading from the stack should make
no difference compared to converting it from phys to virt.

Fixes: f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU")
Fixes: e2fb9f544431 ("powerpc/32: Prepare for Kernel Userspace Access 
Protection")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 73b80143ffac..27e2afce8b78 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -180,7 +180,7 @@ transfer_to_handler:
 2: /* if from kernel, check interrupted DOZE/NAP mode and
  * check for stack overflow
  */
-   kuap_save_and_lock r11, r12, r9, r2, r0
+   kuap_save_and_lock r11, r12, r9, r2, r6
addir2, r12, -THREAD
 #ifndef CONFIG_VMAP_STACK
lwz r9,KSP_LIMIT(r12)
@@ -288,6 +288,7 @@ reenable_mmu:
rlwinm  r9,r9,0,~MSR_EE
lwz r12,_LINK(r11)  /* and return to address in LR */
kuap_restore r11, r2, r3, r4, r5
+   lwz r2, GPR2(r11)
b   fast_exception_return
 #endif
 
-- 
2.25.0



Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-27 Thread Sandipan Das
Hi Dave,

On 23/01/20 12:15 am, Dave Hansen wrote:
> Still doesn't build for me:
> 

I have this patch that hopefully fixes this. My understanding was
that the vm tests are supposed to be generic but this has quite a
bit of x86-specific conditional code which complicates things even
though it is not used by any of the other tests.

I'm not sure if we should keep x86 multilib build support for these
selftests but I'll let the maintainers take a call.

>From a5609f79e5d5164c99c3cb599e14ca620de9c8d4 Mon Sep 17 00:00:00 2001
From: Sandipan Das 
Date: Sat, 18 Jan 2020 15:59:04 +0530
Subject: [PATCH] selftests: vm: pkeys: Fix multilib builds for x86

This ensures that both 32-bit and 64-bit binaries are generated
when this is built on a x86_64 system that has both 32-bit and
64-bit libraries. Most of the changes have been borrowed from
tools/testing/selftests/x86/Makefile.

Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/Makefile | 72 +
 1 file changed, 72 insertions(+)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 4e9c741be6af..82031f84af21 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,7 +18,30 @@ TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
+
+ifeq ($(ARCH),x86_64)
+CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_program.c -no-pie)
+
+TARGETS := protection_keys
+BINARIES_32 := $(TARGETS:%=%_32)
+BINARIES_64 := $(TARGETS:%=%_64)
+
+ifeq ($(CAN_BUILD_WITH_NOPIE),1)
+CFLAGS += -no-pie
+endif
+
+ifeq ($(CAN_BUILD_I386),1)
+TEST_GEN_FILES += $(BINARIES_32)
+endif
+
+ifeq ($(CAN_BUILD_X86_64),1)
+TEST_GEN_FILES += $(BINARIES_64)
+endif
+else
 TEST_GEN_FILES += protection_keys
+endif
 
 ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
@@ -32,6 +55,55 @@ TEST_FILES := test_vmalloc.sh
 KSFT_KHDR_INSTALL := 1
 include ../lib.mk
 
+ifeq ($(ARCH),x86_64)
+BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
+BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
+
+define gen-target-rule-32
+$(1) $(1)_32: $(OUTPUT)/$(1)_32
+.PHONY: $(1) $(1)_32
+endef
+
+define gen-target-rule-64
+$(1) $(1)_64: $(OUTPUT)/$(1)_64
+.PHONY: $(1) $(1)_64
+endef
+
+ifeq ($(CAN_BUILD_I386),1)
+$(BINARIES_32): CFLAGS += -m32
+$(BINARIES_32): LDLIBS += -lrt -ldl -lm
+$(BINARIES_32): %_32: %.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-32,$(t
+endif
+
+ifeq ($(CAN_BUILD_X86_64),1)
+$(BINARIES_64): CFLAGS += -m64
+$(BINARIES_64): LDLIBS += -lrt -ldl
+$(BINARIES_64): %_64: %.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-64,$(t
+endif
+
+# x86_64 users should be encouraged to install 32-bit libraries
+ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
+all: warn_32bit_failure
+
+warn_32bit_failure:
+   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;   
\
+   echo  "environment. This will reduce test coverage of 64-bit" 2>&1; 
\
+   echo  "kernels. If you are using a Debian-like distribution," 2>&1; 
\
+   echo  "try:"; 2>&1; 
\
+   echo  "";   
\
+   echo  "  apt-get install gcc-multilib libc6-i386 libc6-dev-i386";   
\
+   echo  "";   
\
+   echo  "If you are using a Fedora-like distribution, try:";  
\
+   echo  "";   
\
+   echo  "  yum install glibc-devel.*i686";
\
+   exit 0;
+endif
+endif
+
 $(OUTPUT)/userfaultfd: LDLIBS += -lpthread
 
 $(OUTPUT)/mlock-random-test: LDLIBS += -lcap
-- 
2.24.1

This can also be viewed at:
https://github.com/sandip4n/linux/commit/a5609f79e5d5164c99c3cb599e14ca620de9c8d4

- Sandipan



Re: [PATCH v4 7/9] parisc/perf: open access for CAP_SYS_PERFMON privileged process

2020-01-27 Thread Helge Deller
On 18.12.19 10:29, Alexey Budankov wrote:
>
> Open access to monitoring for CAP_SYS_PERFMON privileged processes.
> For backward compatibility reasons access to the monitoring remains open
> for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
> monitoring is discouraged with respect to CAP_SYS_PERFMON capability.
>
> Signed-off-by: Alexey Budankov 

Acked-by: Helge Deller 

> ---
>  arch/parisc/kernel/perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
> index 676683641d00..c4208d027794 100644
> --- a/arch/parisc/kernel/perf.c
> +++ b/arch/parisc/kernel/perf.c
> @@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char 
> __user *buf,
>   else
>   return -EFAULT;
>
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>
>   if (count != sizeof(uint32_t))
>