Re: [RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls

2017-02-13 Thread Nicholas Piggin
On Mon, 13 Feb 2017 11:04:06 +
David Laight  wrote:

> From: Nicholas Piggin
> > Sent: 10 February 2017 18:23
> > After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
> > guest to host"), a getppid() system call goes from 307 cycles to 358
> > cycles (+17%). This is due significantly to the scratch SPR used by the
> > hypercall.
> > 
> > It turns out there are a some volatile registers common to both system
> > call and hypercall (in particular, r12, cr0, ctr), which can be used to
> > avoid the SPR and some other overheads for the system call case. This
> > brings getppid to 320 cycles (+4%).  
> ...
> > + * syscall register convention is in 
> > Documentation/powerpc/syscall64-abi.txt
> > + *
> > + * For hypercalls, the register convention is as follows:
> > + * r0 volatile
> > + * r1-2 nonvolatile
> > + * r3 volatile parameter and return value for status
> > + * r4-r10 volatile input and output value
> > + * r11 volatile hypercall number and output value
> > + * r12 volatile
> > + * r13-r31 nonvolatile
> > + * LR nonvolatile
> > + * CTR volatile
> > + * XER volatile
> > + * CR0-1 CR5-7 volatile
> > + * CR2-4 nonvolatile
> > + * Other registers nonvolatile
> > + *
> > + * The intersection of volatile registers that don't contain possible
> > + * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
> > + * upon entry without saving.  
> 
> Except that they must surely be set to some known value on exit in order
> to avoid leaking information to the guest.

True. I don't see why that's a problem for the entry code though.

Thanks,
Nick


RE: [RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls

2017-02-13 Thread David Laight
From: Nicholas Piggin
> Sent: 10 February 2017 18:23
> After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
> guest to host"), a getppid() system call goes from 307 cycles to 358
> cycles (+17%). This is due significantly to the scratch SPR used by the
> hypercall.
> 
> It turns out there are a some volatile registers common to both system
> call and hypercall (in particular, r12, cr0, ctr), which can be used to
> avoid the SPR and some other overheads for the system call case. This
> brings getppid to 320 cycles (+4%).
...
> + * syscall register convention is in Documentation/powerpc/syscall64-abi.txt
> + *
> + * For hypercalls, the register convention is as follows:
> + * r0 volatile
> + * r1-2 nonvolatile
> + * r3 volatile parameter and return value for status
> + * r4-r10 volatile input and output value
> + * r11 volatile hypercall number and output value
> + * r12 volatile
> + * r13-r31 nonvolatile
> + * LR nonvolatile
> + * CTR volatile
> + * XER volatile
> + * CR0-1 CR5-7 volatile
> + * CR2-4 nonvolatile
> + * Other registers nonvolatile
> + *
> + * The intersection of volatile registers that don't contain possible
> + * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
> + * upon entry without saving.

Except that they must surely be set to some known value on exit in order
to avoid leaking information to the guest.

David



[RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls

2017-02-10 Thread Nicholas Piggin
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%). This is due significantly to the scratch SPR used by the
hypercall.

It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads for the system call case. This
brings getppid to 320 cycles (+4%).

Still a bit higher than I hoped, I'll do some more experiments.

Patch is not completely polished or tested yet, but I'd like to get
comments on the approach, or any ideas where these 13 cycles still are
(POWER8 LE NV non-relocatable).

Thanks,
Nick

---
 arch/powerpc/kernel/exceptions-64s.S | 122 +--
 1 file changed, 87 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9425e0ebcf7e..97f5eee412d8 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -832,46 +832,77 @@ EXC_VIRT(trap_0b, 0x4b00, 0x4c00, 0xb00)
 TRAMP_KVM(PACA_EXGEN, 0xb00)
 EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
 
+/*
+ * system call / hypercall (0xc00, 0x4c00)
+ *
+ * The system call exception is invoked with "sc 0" and does not alter HV bit.
+ * There is support for kernel code to invoke system calls but there are no
+ * in-tree users.
+ *
+ * The hypercall is invoked with "sc 1" and sets HV=1.
+ *
+ * In HPT, sc 1 always goes to 0xc00 real mode. In RADIX, sc 1 can go to
+ * 0x4c00 virtual mode.
+ *
+ * Call convention:
+ *
+ * syscall register convention is in Documentation/powerpc/syscall64-abi.txt
+ *
+ * For hypercalls, the register convention is as follows:
+ * r0 volatile
+ * r1-2 nonvolatile
+ * r3 volatile parameter and return value for status
+ * r4-r10 volatile input and output value
+ * r11 volatile hypercall number and output value
+ * r12 volatile
+ * r13-r31 nonvolatile
+ * LR nonvolatile
+ * CTR volatile
+ * XER volatile
+ * CR0-1 CR5-7 volatile
+ * CR2-4 nonvolatile
+ * Other registers nonvolatile
+ *
+ * The intersection of volatile registers that don't contain possible
+ * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
+ * upon entry without saving.
+ */
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-/*
- * If CONFIG_KVM_BOOK3S_64_HANDLER is set, save the PPR (on systems
- * that support it) before changing to HMT_MEDIUM. That allows the KVM
- * code to save that value into the guest state (it is the guest's PPR
- * value). Otherwise just change to HMT_MEDIUM as userspace has
- * already saved the PPR.
- */
+   /*
+* There is a little bit of juggling to get syscall and hcall
+* working well. Save r10 in ctr to be restored in case it is a
+* hcall.
+*/
 #define SYSCALL_KVMTEST
\
-   SET_SCRATCH0(r13);  \
+   mr  r12,r13;\
GET_PACA(r13);  \
-   std r9,PACA_EXGEN+EX_R9(r13);   \
-   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR); \
+   mtctr   r10;\
+   KVMTEST_PR(0xc00); /* uses r10, branch to do_kvm_0xc00_system_call */ \
HMT_MEDIUM; \
-   std r10,PACA_EXGEN+EX_R10(r13); \
-   OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r9, CPU_FTR_HAS_PPR);   \
-   mfcrr9; \
-   KVMTEST_PR(0xc00);  \
-   GET_SCRATCH0(r13)
+   mr  r9,r12; \
 
 #else
 #define SYSCALL_KVMTEST
\
-   HMT_MEDIUM
+   HMT_MEDIUM; \
+   mr  r9,r13; \
+   GET_PACA(r13);
 #endif

 #define LOAD_SYSCALL_HANDLER(reg)  \
__LOAD_HANDLER(reg, system_call_common)
 
-/* Syscall routine is used twice, in reloc-off and reloc-on paths */
-#define SYSCALL_PSERIES_1  \
+#define SYSCALL_FASTENDIAN_TEST\
 BEGIN_FTR_SECTION  \
cmpdi   r0,0x1ebe ; \
beq-1f ;\
 END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \
-   mr  r9,r13 ;\
-   GET_PACA(r13) ;