On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre <n...@fluxnic.net> wrote: > > On Mon, 30 Nov 2020, Ard Biesheuvel wrote: > > > (+ Nico) > > > > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel <a...@kernel.org> wrote: > > > > > > On Mon, 23 Nov 2020 at 08:39, Antony Yu <swpe...@gmail.com> wrote: > > > > > > > > __do_div64 clobbers the input register r0 in little endian system. > > > > According to the inline assembly document, if an input operand is > > > > modified, it should be tied to a output operand. This patch can > > > > prevent compilers from reusing r0 register after asm statements. > > > > > > > > Signed-off-by: Antony Yu <swpe...@gmail.com> > > > > --- > > > > arch/arm/include/asm/div64.h | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > > > > index 898e9c78a7e7..809efc51e90f 100644 > > > > --- a/arch/arm/include/asm/div64.h > > > > +++ b/arch/arm/include/asm/div64.h > > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, > > > > uint32_t base) > > > > asm( __asmeq("%0", __xh) > > > > __asmeq("%1", "r2") > > > > __asmeq("%2", "r0") > > > > - __asmeq("%3", "r4") > > > > + __asmeq("%3", "r0") > > > > + __asmeq("%4", "r4") > > > > "bl __do_div64" > > > > - : "=r" (__rem), "=r" (__res) > > > > + : "=r" (__rem), "=r" (__res), "=r" (__n) > > > > : "r" (__n), "r" (__base) > > > > : "ip", "lr", "cc"); > > > > *n = __res; > > > > -- > > > > 2.23.0 > > > > > > > > > > Agree that using r0 as an input operand only is incorrect, and not > > > only on Clang. The compiler might assume that r0 will retain its value > > > across the asm() block, which is obviously not the case. > > You're right. > > This was done like that most likely to work around some stupid code > generation with "__n >> 32" while using gcc from about 20 years ago. IOW > I don't exactly remember why I did it like that, but it is certainly > flawed. > > Here's my version of the fix which should be correct. Warning: this > is completely untested, but should in theory produce the same code on > modern gcc. > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h > index 898e9c78a7..595e538f5b 100644 > --- a/arch/arm/include/asm/div64.h > +++ b/arch/arm/include/asm/div64.h > @@ -21,29 +21,20 @@ > * assembly implementation with completely non standard calling convention > * for arguments and results (beware). > */ > - > -#ifdef __ARMEB__ > -#define __xh "r0" > -#define __xl "r1" > -#else > -#define __xl "r0" > -#define __xh "r1" > -#endif > - > static inline uint32_t __div64_32(uint64_t *n, uint32_t base) > { > register unsigned int __base asm("r4") = base; > register unsigned long long __n asm("r0") = *n; > register unsigned long long __res asm("r2"); > - register unsigned int __rem asm(__xh); > - asm( __asmeq("%0", __xh) > + unsigned int __rem; > + asm( __asmeq("%0", "r0") > __asmeq("%1", "r2") > - __asmeq("%2", "r0") > - __asmeq("%3", "r4") > + __asmeq("%2", "r4") > "bl __do_div64" > - : "=r" (__rem), "=r" (__res) > - : "r" (__n), "r" (__base) > + : "+r" (__n), "=r" (__res) > + : "r" (__base) > : "ip", "lr", "cc"); > + __rem = __n >> 32;
This treats {r0, r1} as a {low, high} pair, regardless of endianness, and so it puts the value of r0 into r1. Doesn't that mean the shift should only be done on little endian? > *n = __res; > return __rem; > } > > I'll submit it if someone confirms it works. > > > Nicolas > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel