> If we're special-casing 64-bit architectures anyways - unrolling the > 32B copy_from_user() for struct rseq_cs appears to be roughly 5-10% > savings on x86-64 when I measured it (well, in a microbenchmark, not > in rseq_get_rseq_cs() directly). Perhaps that could be an additional > avenue for improvement here.
The killer is usually 'user copy hardening'. It significantly slows down sendmsg() and recvmsg(). I've got measurable performance improvements by using __copy_from_user() when the buffer since has already been checked - but isn't a compile-time constant. There is also scope for using _get_user() when reading iovec[] (instead of copy_from_user()) and doing all the bound checks (etc) in the loop. That gives a measurable improvement for writev("/dev/null"). I must sort those patches out again. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)