On Mon, 30 Nov 2020 at 18:52, Nicolas Pitre <n...@fluxnic.net> wrote: > > On Mon, 30 Nov 2020, Ard Biesheuvel wrote: > > > On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <n...@fluxnic.net> wrote: > > > > > Here's my version of the fix which should be correct. Warning: this > > > is completely untested, but should in theory produce the same code on > > > modern gcc. > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > index 898e9c78a7..595e538f5b 100644 > > > --- a/arch/arm/include/asm/div64.h > > > +++ b/arch/arm/include/asm/div64.h > > > @@ -21,29 +21,20 @@ > > > * assembly implementation with completely non standard calling > > > convention > > > * for arguments and results (beware). > > > */ > > > - > > > -#ifdef __ARMEB__ > > > -#define __xh "r0" > > > -#define __xl "r1" > > > -#else > > > -#define __xl "r0" > > > -#define __xh "r1" > > > -#endif > > > - > > > static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > > > { > > > register unsigned int __base asm("r4") = base; > > > register unsigned long long __n asm("r0") = *n; > > > register unsigned long long __res asm("r2"); > > > - register unsigned int __rem asm(__xh); > > > - asm( __asmeq("%0", __xh) > > > + unsigned int __rem; > > > + asm( __asmeq("%0", "r0") > > > __asmeq("%1", "r2") > > > - __asmeq("%2", "r0") > > > - __asmeq("%3", "r4") > > > + __asmeq("%2", "r4") > > > "bl __do_div64" > > > - : "=r" (__rem), "=r" (__res) > > > - : "r" (__n), "r" (__base) > > > + : "+r" (__n), "=r" (__res) > > > + : "r" (__base) > > > : "ip", "lr", "cc"); > > > + __rem = __n >> 32; > > > > This treats {r0, r1} as a {low, high} pair, regardless of endianness, > > and so it puts the value of r0 into r1. Doesn't that mean the shift > > should only be done on little endian? > > Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is > actually translated into "mov r0, r1" to move it into __rem and returned > through r0. > > On big endial it is r0-r1 = high-low. Here "__n >> 32" picks r0 and > moves it to __rem which is returned through r0 so no extra instruction > needed. > > Of course the function is inlined so r0 can be anything, or optimized > away if__rem is not used. >
OK, you're right. I got myself confused there, but a quick test with GCC confirms your explanation: $ arm-linux-gnueabihf-gcc -mbig-endian -O2 -S -o - \ -xc - <<<"long f(long long l) { return l >> 32; }" just produces bx lr whereas removing the -mbig-endian gives mov r0, r1 bx lr I tested the change and it builds and runs fine (although I am not sure how much coverage this code gets on an ordinary boot): Reviewed-by: Ard Biesheuvel <a...@kernel.org> Tested-by: Ard Biesheuvel <a...@kernel.org>