Re: [PATCH v6 4/4] powerpc/vdso: Switch VDSO to generic C implementation.

2020-04-07 Thread Naveen N. Rao

Christophe Leroy wrote:

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobbers the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.







Signed-off-by: Christophe Leroy 
---
v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it
---
 arch/powerpc/Kconfig |   2 +
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++
 arch/powerpc/include/asm/vdso/vsyscall.h |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h |  40 +--
 arch/powerpc/kernel/asm-offsets.c|  49 +---
 arch/powerpc/kernel/time.c   |  91 +-
 arch/powerpc/kernel/vdso.c   |   5 +-
 arch/powerpc/kernel/vdso32/Makefile  |  32 +-
 arch/powerpc/kernel/vdso32/config-fake32.h   |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S| 291 +--
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  29 ++
 arch/powerpc/kernel/vdso64/Makefile  |  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S| 243 +---
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 ++
 16 files changed, 391 insertions(+), 691 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c


You should also consider adding -fasynchronous-unwind-tables. For 
background, please see:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba96301ce9be7925cdaee677b1a2ff8eddba9fd4


- Naveen



[PATCH v6 4/4] powerpc/vdso: Switch VDSO to generic C implementation.

2020-04-07 Thread Christophe Leroy
powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobbers the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18:   35 25 ff e0 addic.  r9,r5,-32
  1c:   41 80 00 10 blt 2c 
  20:   7c 64 4c 30 srw r4,r3,r9
  24:   38 60 00 00 li  r3,0
...
  2c:   54 69 08 3c rlwinm  r9,r3,1,0,30
  30:   21 45 00 1f subfic  r10,r5,31
  34:   7c 84 2c 30 srw r4,r4,r5
  38:   7d 29 50 30 slw r9,r9,r10
  3c:   7c 63 2c 30 srw r3,r3,r5
  40:   7d 24 23 78 or  r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0:   21 25 00 20 subfic  r9,r5,32
   4:   7c 69 48 30 slw r9,r3,r9
   8:   7c 84 2c 30 srw r4,r4,r5
   c:   7d 24 23 78 or  r4,r9,r4
  10:   7c 63 2c 30 srw r3,r3,r5

For VDSO32 on PPC64, we create a fake 32 bits config, on the same
principle as MIPS architecture, in order to get the correct parts of
the different asm header files.

With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 828 nsec/call
clock-getres-realtime-coarse:vdso: 391 nsec/call
clock-gettime-realtime-coarse:vdso: 614 nsec/call
clock-getres-realtime:vdso: 460 nsec/call
clock-gettime-realtime:vdso: 876 nsec/call
clock-getres-monotonic-coarse:vdso: 399 nsec/call
clock-gettime-monotonic-coarse:vdso: 691 nsec/call
clock-getres-monotonic:vdso: 460 nsec/call
clock-gettime-monotonic:vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 955 nsec/call
clock-getres-realtime-coarse:vdso: 545 nsec/call
clock-gettime-realtime-coarse:vdso: 592 nsec/call
clock-getres-realtime:vdso: 545 nsec/call
clock-gettime-realtime:vdso: 941 nsec/call
clock-getres-monotonic-coarse:vdso: 545 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call
clock-getres-monotonic:vdso: 545 nsec/call
clock-gettime-monotonic:vdso: 940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
clock-gettime-boottime:vdso: 3851 nsec/call
clock-gettime-tai:vdso: 3852 nsec/call
clock-gettime-monotonic-raw:vdso: 3396 nsec/call

Same clocks with C VDSO:
clock-gettime-tai:vdso: 941 nsec/call
clock-gettime-monotonic-raw:vdso: 1001 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 220 nsec/call
clock-getres-realtime-coarse:vdso: 102 nsec/call
clock-gettime-realtime-coarse:vdso: 178 nsec/call
clock-getres-realtime:vdso: 129 nsec/call
clock-gettime-realtime:vdso: 235 nsec/call
clock-getres-monotonic-coarse:vdso: 105 nsec/call
clock-gettime-monotonic-coarse:vdso: 208 nsec/call
clock-getres-monotonic:vdso: 129 nsec/call
clock-gettime-monotonic:vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 272 nsec/call
clock-getres-realtime-coarse:vdso: 160 nsec/call
clock-gettime-realtime-coarse:vdso: 184 nsec/call
clock-getres-realtime:vdso: 166 nsec/call
clock-gettime-realtime:vdso: 281 nsec/call
clock-getres-monotonic-coarse:vdso: 160 nsec/call
clock-gettime-monotonic-coarse:vdso: 184 nsec/call
clock-getres-monotonic:vdso: 169 nsec/call
clock-gettime-monotonic:vdso: 275 nsec/call

Signed-off-by: Christophe Leroy 
---
v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it
---
 arch/powerpc/Kconfig |   2 +
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++
 arch/powerpc/include/asm/vdso/vsyscall.h |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h |  40 +--
 arch/powerpc/kernel/asm-offsets.c|  49 +---
 arch/powerpc/kernel/time.c