On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote: > From: Nathan Lynch <nath...@linux.ibm.com> > > The kernel can handle retrying RTAS function calls in response to > -2/990x in the sys_rtas() handler instead of relaying the > intermediate > status to user space. > > Justifications: > > * Currently it's nondeterministic and quite variable in practice > whether a retry status is returned for any given invocation of > sys_rtas(). Therefore user space code cannot be expecting a retry > result without already being broken. > > * This tends to significantly reduce the total number of system calls > issued by programs such as drmgr which make use of sys_rtas(), > improving the experience of tracing and debugging such > programs. This is the main motivation for me: I think this change > will make it easier for us to characterize current sys_rtas() use > cases as we move them to other interfaces over time. > > * It reduces the number of opportunities for user space to leave > complex operations, such as those associated with DLPAR, incomplete > and diffcult to recover. > > * We can expect performance improvements for existing sys_rtas() > users, not only because of overall reduction in the number of > system > calls issued, but also due to the better handling of -2/990x in the > kernel. For example, librtas still sleeps for 1ms on -2, which is > completely unnecessary.
Would be good to see this fixed on the librtas side. > > Performance differences for PHB add and remove on a small P10 PowerVM > partition are included below. For add, elapsed time is slightly > reduced. For remove, there are more significant improvements: the > number of context switches is reduced by an order of magnitude, and > elapsed time is reduced by over half. > > (- before, + after): > > Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs): > > - 1,847.58 msec task-clock # 0.135 > CPUs utilized ( +- 14.15% ) > - 10,867 cs # 9.800 > K/sec ( +- 14.14% ) > + 1,901.15 msec task-clock # 0.148 > CPUs utilized ( +- 14.13% ) > + 10,451 cs # 9.158 > K/sec ( +- 14.14% ) > > - 13.656557 +- 0.000124 seconds time elapsed ( +- 0.00% ) > + 12.88080 +- 0.00404 seconds time elapsed ( +- 0.03% ) > > Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs): > > - 1,473.75 msec task-clock # 0.092 > CPUs utilized ( +- 14.15% ) > - 2,652 cs # 3.000 > K/sec ( +- 14.16% ) > + 1,444.55 msec task-clock # 0.221 > CPUs utilized ( +- 14.14% ) > + 104 cs # 119.957 > /sec ( +- 14.63% ) > > - 15.99718 +- 0.00801 seconds time elapsed ( +- 0.05% ) > + 6.54256 +- 0.00830 seconds time elapsed ( +- 0.13% ) > > Move the existing rtas_lock-guarded critical section in sys_rtas() > into a conventional rtas_busy_delay()-based loop, returning to user > space only when a final success or failure result is available. > > Signed-off-by: Nathan Lynch <nath...@linux.ibm.com> Should there be some kind of timeout? I'm a bit worried by sleeping in a syscall for an extended period. -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited