On Mon, Jun 01, 2020 at 10:26:09PM +0200, Michael Karcher wrote: > Rich Felker schrieb: > >> >> Can I propose a different solution? For archs where there isn't > >> >> actually any 64-bit load or store instruction, does it make sense to > >> >> be writing asm just to do two 32-bit loads/stores, especially when > >> >> this code is not in a hot path? > >> > Yes, that's an option, too. > >> That's the solution that Michael Karcher suggested to me as an > >> alternative when I talked to him off-list. > > There is a functional argument agains using get_user_32 twice, which I > overlooked in my private reply to Adrian. If any of the loads fail, we do > not only want err to be set to -EFAULT (which will happen), but we also > want a 64-bit zero as result. If one 32-bit read faults, but the other one > works, we would get -EFAULT together with 32 valid data bits, and 32 zero > bits.
Indeed, if you do it that way you want to check the return value and set the value to 0 if either faults. BTW I'm not sure what's supposed to happen on write if half faults after the other half already succeeded... Either a C approach or an asm approach has to consider that. > > I don't have an objection to doing it the way you've proposed, but I > > don't think there's any performance distinction or issue with the two > > invocations. > > Assuming we don't need two exception table entries (put_user_64 currently > uses only one, maybe it's wrong), using put_user_32 twice creates an extra > unneeded exception table entry, which will "bloat" the exception table. > That table is most likely accessed by a binary search algorithm, so the > performance loss is marginal, though. Also a bigger table size is > cache-unfriendly. (Again, this is likely marginal again, as binary search > is already extremely cache-unfriendly). > > A similar argument can be made for the exception handler. Even if we need > two entries in the exception table, so the first paragraph does not apply, > the two entries in the exception table can share the same exception > handler (clear the whole 64-bit destination to zero, set -EFAULT, jump > past both load instructions), so that part of (admittedly cold) kernel > code can get some instructios shorter. Indeed. I don't think it's a significant difference but if kernel folks do that's fine. In cases like this my personal preference is to err on the side of less arch-specific asm. Rich