On Tue, Mar 23, 2021 at 01:32:18PM +0000, Will Deacon wrote: > On Tue, Mar 23, 2021 at 12:08:56PM +0000, Robin Murphy wrote: > > On 2021-03-23 07:34, Yang Yingliang wrote: > > > When copy over 128 bytes, src/dst is added after > > > each ldp/stp instruction, it will cost more time. > > > To improve this, we only add src/dst after load > > > or store 64 bytes. > > > > This breaks the required behaviour for copy_*_user(), since the fault > > handler expects the base address to be up-to-date at all times. Say you're > > copying 128 bytes and fault on the 4th store, it should return 80 bytes not > > copied; the code below would return 128 bytes not copied, even though 48 > > bytes have actually been written to the destination. > > > > We've had a couple of tries at updating this code (because the whole > > template is frankly a bit terrible, and a long way from the well-optimised > > code it was derived from), but getting the fault-handling behaviour right > > without making the handler itself ludicrously complex has proven tricky. And > > then it got bumped down the priority list while the uaccess behaviour in > > general was in flux - now that the dust has largely settled on that I should > > probably try to find time to pick this up again... > > I think the v5 from Oli was pretty close, but it didn't get any review: > > https://lore.kernel.org/r/20200914151800.2270-1-oli.sw...@arm.com
These are still unread in my inbox as I was planning to look at them again. However, I think we discussed a few options on how to proceed but no concrete plans: 1. Merge Oli's patches as they are, with some potential complexity issues as fixing the user copy accuracy was non-trivial. I think the latest version uses a two-stage approach: when taking a fault, it falls back to to byte-by-byte with the expectation that it faults again and we can then report the correct fault address. 2. Only use Cortex Strings for in-kernel memcpy() while the uaccess routines are some simple loops that align the uaccess part only (unlike Cortex Strings which usually to align the source). 3. Similar to 2 but with Cortex Strings imported automatically with some script to make it easier to keep the routines up to date. If having non-optimal (but good enough) uaccess routines is acceptable, I'd go for (2) with a plan to move to (3) at the next Cortex Strings update. I also need to look again at option (1) to see how complex it is but given the time one spends on importing a new Cortex Strings library, I don't think (1) scales well on the long term. We could, however, go for (1) now and look at (3) with the next update. -- Catalin