Rich Felker schrieb: >> >> Can I propose a different solution? For archs where there isn't >> >> actually any 64-bit load or store instruction, does it make sense to >> >> be writing asm just to do two 32-bit loads/stores, especially when >> >> this code is not in a hot path? >> > Yes, that's an option, too. >> That's the solution that Michael Karcher suggested to me as an >> alternative when I talked to him off-list.
There is a functional argument agains using get_user_32 twice, which I overlooked in my private reply to Adrian. If any of the loads fail, we do not only want err to be set to -EFAULT (which will happen), but we also want a 64-bit zero as result. If one 32-bit read faults, but the other one works, we would get -EFAULT together with 32 valid data bits, and 32 zero bits. I continue to elaborate on my performance remark, ignoring this functional difference (which is a good reason to not just use two 32-bit accesses, much more so than the performance "issue"). > I don't have an objection to doing it the way you've proposed, but I > don't think there's any performance distinction or issue with the two > invocations. Assuming we don't need two exception table entries (put_user_64 currently uses only one, maybe it's wrong), using put_user_32 twice creates an extra unneeded exception table entry, which will "bloat" the exception table. That table is most likely accessed by a binary search algorithm, so the performance loss is marginal, though. Also a bigger table size is cache-unfriendly. (Again, this is likely marginal again, as binary search is already extremely cache-unfriendly). A similar argument can be made for the exception handler. Even if we need two entries in the exception table, so the first paragraph does not apply, the two entries in the exception table can share the same exception handler (clear the whole 64-bit destination to zero, set -EFAULT, jump past both load instructions), so that part of (admittedly cold) kernel code can get some instructios shorter. Regards, Michael Karcher