Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

2020-01-11 Thread Masami Hiramatsu
On Fri, 10 Jan 2020 21:35:12 -0300
arnaldo.m...@gmail.com wrote:

> ,Jann Horn ,Thomas Gleixner 
> ,Tvrtko Ursulin ,Lionel 
> Landwerlin ,linux-kernel 
> ,"linux-security-mod...@vger.kernel.org" 
> ,"seli...@vger.kernel.org" 
> ,"intel-...@lists.freedesktop.org" 
> ,"b...@vger.kernel.org" 
> ,"linux-par...@vger.kernel.org" 
> ,"linuxppc-dev@lists.ozlabs.org" 
> ,"linux-perf-us...@vger.kernel.org" 
> ,"linux-arm-ker...@lists.infradead.org" 
> ,"oprofile-l...@lists.sf.net" 
> 
> From: Arnaldo Carvalho de Melo 
> Message-ID: 
> 
> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu  
> wrote:
> >
> >
> >> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu 
> >wrote:
> >> 
> >> On Fri, 10 Jan 2020 13:45:31 -0300
> >> Arnaldo Carvalho de Melo  wrote:
> >> 
> >>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>  On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> > wrote:
> > Again, this only allows attaching to previously created kprobes,
> >it does
> > not allow creating kprobes, right?
> >>> 
> > That is; I don't think CAP_SYS_PERFMON should be allowed to create
> > kprobes.
> >>> 
> > As might be clear; I don't actually know what the user-ABI is for
> > creating kprobes.
> >>> 
>  There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> >interface to
>  define new kprobe events, and those events are treated as
> >completely same as
>  tracepoint events. On the other hand, ebpf tries to define new
> >probe event
>  via perf_event interface. Above one is that interface. IOW, it
> >creates new kprobe.
> >>> 
> >>> Masami, any plans to make 'perf probe' use the perf_event_open()
> >>> interface for creating kprobes/uprobes?
> >> 
> >> Would you mean perf probe to switch to perf_event_open()?
> >> No, perf probe is for setting up the ftrace probe events. I think we
> >can add an
> >> option to use perf_event_open(). But current kprobe creation from
> >perf_event_open()
> >> is separated from ftrace by design.
> >
> >I guess we can extend event parser to understand kprobe directly.
> >Instead of
> >
> > perf probe kernel_func
> > perf stat/record -e probe:kernel_func ...
> >
> >We can just do 
> >
> > perf stat/record -e kprobe:kernel_func ...
> 
> 
> You took the words from my mouth, exactly, that is a perfect use case, an 
> alternative to the 'perf probe' one of making a disabled event that then gets 
> activated via record/stat/trace, in many cases it's better, removes the 
> explicit probe setup case.

Ah, I got it. If the perf event parser just kicks perf's kprobe creation
interface, it will be easy. In that case, there should be following differences.

- perf * -e "kprobe":kernel_func will put a local (hidden) kprobe
  events. So ftrace user can not access it.
- perf * -e "kprobe":kernel_func may not support inline/function-body
  nor trace local variables etc.

Hm, if we support inline function via -e "kprobe" interface, we have to
expand perf_event_open() to support multi-probe event.

Thanks,

> 
> Regards, 
> 
> - Arnaldo
> 
> >
> >Thanks,
> >Song
> 


-- 
Masami Hiramatsu 


Re: [PATCH] lib: vdso: mark __cvdso_clock_getres() as static

2020-01-11 Thread Thomas Gleixner
Christophe Leroy  writes:
> When __cvdso_clock_getres() became __cvdso_clock_getres_common()
> and a new __cvdso_clock_getres() was added, static qualifier was
> forgotten.
>
> Fixes: 502a590a170b ("lib/vdso: Move fallback invocation to the callers")
> Signed-off-by: Christophe Leroy 

I've already queued:

 https://lore.kernel.org/r/20191128111719.8282-1-vincenzo.frasc...@arm.com

but thanks for caring!

Thanks,

tglx


[PATCH] lib: vdso: mark __cvdso_clock_getres() as static

2020-01-11 Thread Christophe Leroy
When __cvdso_clock_getres() became __cvdso_clock_getres_common()
and a new __cvdso_clock_getres() was added, static qualifier was
forgotten.

Fixes: 502a590a170b ("lib/vdso: Move fallback invocation to the callers")
Signed-off-by: Christophe Leroy 
---
 lib/vdso/gettimeofday.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 9ecfd3b547ba..42bd8ab955fa 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -221,6 +221,7 @@ int __cvdso_clock_getres_common(clockid_t clock, struct 
__kernel_timespec *res)
return 0;
 }
 
+static __maybe_unused
 int __cvdso_clock_getres(clockid_t clock, struct __kernel_timespec *res)
 {
int ret = __cvdso_clock_getres_common(clock, res);
-- 
2.13.3



Re: [PATCH v15 00/24] selftests, powerpc, x86: Memory Protection Keys

2020-01-11 Thread Sandipan Das
Hi Dave,

On 10/01/20 11:27 pm, Dave Hansen wrote:
> 
> Could you dump these in a git tree, please?  It will make it a wee bit
> easier for me to ship the resulting tree around to a couple different
> systems.
> 

I have pushed a version of this series that uses u64 for all references
to the pkey register irrespective of architecture. This is available at:

https://github.com/sandip4n/linux/tree/pkey-selftests


- Sandipan



Re: linux-next: build warning after merge of the bpf-next tree

2020-01-11 Thread Alexandre Ghiti



On 1/10/20 7:20 PM, Palmer Dabbelt wrote:

On Fri, 10 Jan 2020 14:28:17 PST (-0800), alexan...@ghiti.fr wrote:

Hi guys,

On 10/27/19 8:02 PM, Stephen Rothwell wrote:

Hi all,

On Fri, 18 Oct 2019 10:56:57 +1100 Stephen Rothwell 
 wrote:

Hi all,

After merging the bpf-next tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

WARNING: 2 bad relocations
c1998a48 R_PPC64_ADDR64 _binary__btf_vmlinux_bin_start
c1998a50 R_PPC64_ADDR64 _binary__btf_vmlinux_bin_end

Introduced by commit

   8580ac9404f6 ("bpf: Process in-kernel BTF")

This warning now appears in the net-next tree build.


I bump that thread up because Zong also noticed that 2 new 
relocations for

those symbols appeared in my riscv relocatable kernel branch following
that commit.

I also noticed 2 new relocations R_AARCH64_ABS64 appearing in arm64 
kernel.


Those 2 weak undefined symbols have existed since commit
341dfcf8d78e ("btf: expose BTF info through sysfs") but this is the fact
to declare those symbols into btf.c that produced those relocations.

I'm not sure what this all means, but this is not something I expected
for riscv for
a kernel linked with -shared/-fpie. Maybe should we just leave them to
zero ?

I think that deserves a deeper look if someone understands all this
better than I do.


Can you give me a pointer to your tree and how to build a relocatable 
kernel?

Weak undefined symbols have the absolute value 0,



So according to you the 2 new relocations R_RISCV_64 are normal and 
should not

be modified at runtime right ?



but the kernel is linked at
an address such that 0 can't be reached by normal means.  When I added 
support

to binutils for this I did it in a way that required almost no code --
essetially I just stopped dissallowing x0 as a possible base register 
for PCREL
relocations, which results in 0 always being accessible.  I just 
wanted to get

the kernel to build again, so I didn't worry about chasing around all the
addressing modes.  The PIC/PIE support generates different relocations 
and I

wouldn't be surprised if I just missed one (or more likely all) of them.

It's probably a simple fix, though I feel like every time I say that 
about the

linker I end up spending a month in there...


You can find it here:

https://github.com/AlexGhiti/riscv-linux/tree/int/alex/riscv_relocatable_v1

Zong fixed the bug introduced by those 2 new relocations and everything 
works

like a charm, so I'm not sure you have to dig in the linker :)

Alex



Re: linux-next: build warning after merge of the bpf-next tree

2020-01-11 Thread Alexandre Ghiti



On 1/10/20 6:18 PM, Alexei Starovoitov wrote:

On Fri, Jan 10, 2020 at 2:28 PM Alexandre Ghiti  wrote:

Hi guys,

On 10/27/19 8:02 PM, Stephen Rothwell wrote:

Hi all,

On Fri, 18 Oct 2019 10:56:57 +1100 Stephen Rothwell  
wrote:

Hi all,

After merging the bpf-next tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

WARNING: 2 bad relocations
c1998a48 R_PPC64_ADDR64_binary__btf_vmlinux_bin_start
c1998a50 R_PPC64_ADDR64_binary__btf_vmlinux_bin_end

Introduced by commit

8580ac9404f6 ("bpf: Process in-kernel BTF")

This warning now appears in the net-next tree build.



I bump that thread up because Zong also noticed that 2 new relocations for
those symbols appeared in my riscv relocatable kernel branch following
that commit.

I also noticed 2 new relocations R_AARCH64_ABS64 appearing in arm64 kernel.

Those 2 weak undefined symbols have existed since commit
341dfcf8d78e ("btf: expose BTF info through sysfs") but this is the fact
to declare those symbols into btf.c that produced those relocations.

I'm not sure what this all means, but this is not something I expected
for riscv for
a kernel linked with -shared/-fpie. Maybe should we just leave them to
zero ?

I think that deserves a deeper look if someone understands all this
better than I do.

Are you saying there is a warning for arm64 as well?



Nop.



Can ppc folks explain the above warning?
What does it mean "2 bad relocations"?



This is what I'd like to understand too, it is not clear in
the ppc tool that outputs this message why it is considered 'bad'.



The code is doing:
extern char __weak _binary__btf_vmlinux_bin_start[];
extern char __weak _binary__btf_vmlinux_bin_end[];
Since they are weak they should be zero when not defined.
What's the issue?



There likely is no issue, I just want to make sure those relocations
are legitimate and I want to understand what we should do with those.

At the moment arm64 does not relocate those at runtime and purely
ignore them: is this the right thing to do ?




Re: Surprising code generated for vdso_read_begin()

2020-01-11 Thread Segher Boessenkool
On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >It looks like the compiler did loop peeling.  What GCC version is this?
> >Please try current trunk (to become GCC 10), or at least GCC 9?
> 
> It is with GCC 5.5
> 
> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more 
> recent than 8.1

Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
this hard and/or painful to do?

> With 8.1, the problem doesn't show up.

Good to hear that.  Hrm, I wonder when it was fixed...


Segher


Re: [RFC PATCH v2 07/10] lib: vdso: don't use READ_ONCE() in __c_kernel_time()

2020-01-11 Thread Thomas Gleixner
Christophe Leroy  writes:
>
> With READ_ONCE() the 64 bits are being read:
>
> Without the READ_ONCE() only 32 bits are read. That's the most optimal.
>
> Without READ_ONCE() but with a barrier() after the read, we should get 
> the same result but GCC (GCC 8.1) does less good:
>
> Assuming both part of the 64 bits data will fall into a single 
> cacheline, the second read is in the noise.

They definitely are in the same cacheline.

> So agreed to drop this change.

We could be smart about this and force the compiler to issue a 32bit
read for 32bit builds. See below. Not sure whether it's worth it, but
OTOH it will take quite a while until the 32bit time interfaces die
completely.

Thanks,

tglx

8<
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -21,6 +21,18 @@
 #define CS_RAW 1
 #define CS_BASES   (CS_RAW + 1)
 
+#ifdef __LITTLE_ENDIAN
+struct sec_hl {
+   u32 sec_l;
+   u32 sec_h;
+};
+#else
+struct sec_hl {
+   u32 sec_h;
+   u32 sec_l;
+};
+#endif
+
 /**
  * struct vdso_timestamp - basetime per clock_id
  * @sec:   seconds
@@ -35,7 +47,10 @@
  * vdso_data.cs[x].shift.
  */
 struct vdso_timestamp {
-   u64 sec;
+   union {
+   u64 sec;
+   struct sec_hl   sec_hl;
+   };
u64 nsec;
 };
 
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -165,8 +165,13 @@ static __maybe_unused int
 static __maybe_unused __kernel_old_time_t __cvdso_time(__kernel_old_time_t 
*time)
 {
const struct vdso_data *vd = __arch_get_vdso_data();
-   __kernel_old_time_t t = 
READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
+   __kernel_old_time_t t;
 
+#if BITS_PER_LONG == 32
+   t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec_hl.sec_l);
+#else
+   t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
+#endif
if (time)
*time = t;
 


Re: [RFC PATCH v2 05/10] lib: vdso: inline do_hres()

2020-01-11 Thread Christophe Leroy




On 01/10/2020 09:07 PM, Thomas Gleixner wrote:

Arnd Bergmann  writes:

On Mon, Dec 23, 2019 at 3:31 PM Christophe Leroy
 wrote:


do_hres() is called from several places, so GCC doesn't inline
it at first.

do_hres() takes a struct __kernel_timespec * parameter for
passing the result. In the 32 bits case, this parameter corresponds
to a local var in the caller. In order to provide a pointer
to this structure, the caller has to put it in its stack and
do_hres() has to write the result in the stack. This is suboptimal,
especially on RISC processor like powerpc.

By making GCC inline the function, the struct __kernel_timespec
remains a local var using registers, avoiding the need to write and
read stack.

The improvement is significant on powerpc.

Signed-off-by: Christophe Leroy 


Good idea, I can see how this ends up being an improvement
for most of the callers.

Acked-by: Arnd Bergmann 


   https://lore.kernel.org/r/20191112012724.250792-3-d...@arista.com

On the way to be applied.



Oh nice, I get even better result with the way it is done by Dmitry 
compared to my own first patch.


On an mpc8xx at 132Mhz (32bits powerpc), before the patch I have
gettimeofday:vdso: 1256 nsec/call
clock-gettime-monotonic-raw:vdso: 1449 nsec/call
clock-gettime-monotonic-coarse:vdso: 768 nsec/call
clock-gettime-monotonic:vdso: 1390 nsec/call

With the patch I have:
gettimeofday:vdso: 947 nsec/call
clock-gettime-monotonic-raw:vdso: 1156 nsec/call
clock-gettime-monotonic-coarse:vdso: 638 nsec/call
clock-gettime-monotonic:vdso: 1094 nsec/call

So that's a 20-25% improvement.

I modified it slightly as follows:

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 9e474d54814f..b793f211bca8 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -37,8 +37,8 @@ u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, 
u32 mult)

 }
 #endif

-static int do_hres(const struct vdso_data *vd, clockid_t clk,
-  struct __kernel_timespec *ts)
+static __always_inline int do_hres(const struct vdso_data *vd, 
clockid_t clk,

+  struct __kernel_timespec *ts)
 {
const struct vdso_timestamp *vdso_ts = >basetime[clk];
u64 cycles, last, sec, ns;
@@ -67,8 +67,8 @@ static int do_hres(const struct vdso_data *vd, 
clockid_t clk,

return 0;
 }

-static void do_coarse(const struct vdso_data *vd, clockid_t clk,
- struct __kernel_timespec *ts)
+static __always_inline int do_coarse(const struct vdso_data *vd, 
clockid_t clk,

+struct __kernel_timespec *ts)
 {
const struct vdso_timestamp *vdso_ts = >basetime[clk];
u32 seq;
@@ -78,6 +78,8 @@ static void do_coarse(const struct vdso_data *vd, 
clockid_t clk,

ts->tv_sec = vdso_ts->sec;
ts->tv_nsec = vdso_ts->nsec;
} while (unlikely(vdso_read_retry(vd, seq)));
+
+   return 0;
 }

 static __maybe_unused int
@@ -95,15 +97,16 @@ __cvdso_clock_gettime_common(const struct vdso_data 
*vd, clockid_t clock,

 * clocks are handled in the VDSO directly.
 */
msk = 1U << clock;
-   if (likely(msk & VDSO_HRES)) {
-   return do_hres([CS_HRES_COARSE], clock, ts);
-   } else if (msk & VDSO_COARSE) {
-   do_coarse([CS_HRES_COARSE], clock, ts);
-   return 0;
-   } else if (msk & VDSO_RAW) {
-   return do_hres([CS_RAW], clock, ts);
-   }
-   return -1;
+   if (likely(msk & VDSO_HRES))
+   vd += CS_HRES_COARSE;
+   else if (msk & VDSO_COARSE)
+   return do_coarse([CS_HRES_COARSE], clock, ts);
+   else if (msk & VDSO_RAW)
+   vd += CS_RAW;
+   else
+   return -1;
+
+   return do_hres(vd, clock, ts);
 }

 static __maybe_unused int

---

Christophe


Re: [RFC PATCH v2 07/10] lib: vdso: don't use READ_ONCE() in __c_kernel_time()

2020-01-11 Thread Christophe Leroy




On 01/10/2020 09:12 PM, Thomas Gleixner wrote:

Christophe Leroy  writes:


diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 17b4cff6e5f0..5a17a9d2e6cd 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -144,7 +144,7 @@ __cvdso_gettimeofday(const struct vdso_data *vd, struct 
__kernel_old_timeval *tv
  static __maybe_unused __kernel_old_time_t
  __cvdso_time(const struct vdso_data *vd, __kernel_old_time_t *time)
  {
-   __kernel_old_time_t t = 
READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
+   __kernel_old_time_t t = vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec;
  
  	if (time)

*time = t;


Allows the compiler to load twice, i.e. the returned value might be different 
from the
stored value. So no.



With READ_ONCE() the 64 bits are being read:

0ac8 <__c_kernel_time>:
 ac8:   2c 03 00 00 cmpwi   r3,0
 acc:   81 44 00 20 lwz r10,32(r4)
 ad0:   81 64 00 24 lwz r11,36(r4)
 ad4:   41 82 00 08 beq adc <__c_kernel_time+0x14>
 ad8:   91 63 00 00 stw r11,0(r3)
 adc:   7d 63 5b 78 mr  r3,r11
 ae0:   4e 80 00 20 blr

Without the READ_ONCE() only 32 bits are read. That's the most optimal.

0ac8 <__c_kernel_time>:
 ac8:   7c 69 1b 79 mr. r9,r3
 acc:   80 64 00 24 lwz r3,36(r4)
 ad0:   4d 82 00 20 beqlr
 ad4:   90 69 00 00 stw r3,0(r9)
 ad8:   4e 80 00 20 blr

Without READ_ONCE() but with a barrier() after the read, we should get 
the same result but GCC (GCC 8.1) does less good:


0ac8 <__c_kernel_time>:
 ac8:   81 24 00 24 lwz r9,36(r4)
 acc:   2f 83 00 00 cmpwi   cr7,r3,0
 ad0:   41 9e 00 08 beq cr7,ad8 <__c_kernel_time+0x10>
 ad4:   91 23 00 00 stw r9,0(r3)
 ad8:   7d 23 4b 78 mr  r3,r9
 adc:   4e 80 00 20 blr

Assuming both part of the 64 bits data will fall into a single 
cacheline, the second read is in the noise.


So agreed to drop this change.

Christophe