Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
Nicolas Pitre writes: > On Thu, 19 Nov 2015, Måns Rullgård wrote: > >> Nicolas Pitre writes: >> >> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) >> > +{ >> > + unsigned long long res; >> > + unsigned int tmp = 0; >> > + >> > + if (!bias) { >> > + asm ( "umull %Q0, %R0, %Q1, %Q2\n\t" >> > + "mov%Q0, #0" >> > + : "=&r" (res) >> > + : "r" (m), "r" (n) >> > + : "cc"); >> > + } else if (!(m & ((1ULL << 63) | (1ULL << 31 { >> > + res = m; >> > + asm ( "umlal %Q0, %R0, %Q1, %Q2\n\t" >> > + "mov%Q0, #0" >> > + : "+&r" (res) >> > + : "r" (m), "r" (n) >> > + : "cc"); >> > + } else { >> > + asm ( "umull %Q0, %R0, %Q2, %Q3\n\t" >> > + "cmn%Q0, %Q2\n\t" >> > + "adcs %R0, %R0, %R2\n\t" >> > + "adc%Q0, %1, #0" >> > + : "=&r" (res), "+&r" (tmp) >> > + : "r" (m), "r" (n) >> >> Why is tmp using a +r constraint here? The register is not written, so >> using an input-only operand could/should result in better code. That is >> also what the old code did. > > No, it is worse. gcc allocates two registers because, somehow, it > doesn't think that the first one still holds zero after the first usage. > This way usage of only one temporary register is forced throughout, > producing better code. Makes sense. Thanks for explaining. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
On Thu, 19 Nov 2015, Måns Rullgård wrote: > Nicolas Pitre writes: > > > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) > > +{ > > + unsigned long long res; > > + unsigned int tmp = 0; > > + > > + if (!bias) { > > + asm ( "umull %Q0, %R0, %Q1, %Q2\n\t" > > + "mov%Q0, #0" > > + : "=&r" (res) > > + : "r" (m), "r" (n) > > + : "cc"); > > + } else if (!(m & ((1ULL << 63) | (1ULL << 31 { > > + res = m; > > + asm ( "umlal %Q0, %R0, %Q1, %Q2\n\t" > > + "mov%Q0, #0" > > + : "+&r" (res) > > + : "r" (m), "r" (n) > > + : "cc"); > > + } else { > > + asm ( "umull %Q0, %R0, %Q2, %Q3\n\t" > > + "cmn%Q0, %Q2\n\t" > > + "adcs %R0, %R0, %R2\n\t" > > + "adc%Q0, %1, #0" > > + : "=&r" (res), "+&r" (tmp) > > + : "r" (m), "r" (n) > > Why is tmp using a +r constraint here? The register is not written, so > using an input-only operand could/should result in better code. That is > also what the old code did. No, it is worse. gcc allocates two registers because, somehow, it doesn't think that the first one still holds zero after the first usage. This way usage of only one temporary register is forced throughout, producing better code. I meant to have this split out in a separate patch but messed it up somehow. > > > + : "cc"); > > + } > > + > > + if (!(m & ((1ULL << 63) | (1ULL << 31 { > > + asm ( "umlal %R0, %Q0, %R1, %Q2\n\t" > > + "umlal %R0, %Q0, %Q1, %R2\n\t" > > + "mov%R0, #0\n\t" > > + "umlal %Q0, %R0, %R1, %R2" > > + : "+&r" (res) > > + : "r" (m), "r" (n) > > + : "cc"); > > + } else { > > + asm ( "umlal %R0, %Q0, %R2, %Q3\n\t" > > + "umlal %R0, %1, %Q2, %R3\n\t" > > + "mov%R0, #0\n\t" > > + "adds %Q0, %1, %Q0\n\t" > > + "adc%R0, %R0, #0\n\t" > > + "umlal %Q0, %R0, %R2, %R3" > > + : "+&r" (res), "+&r" (tmp) > > + : "r" (m), "r" (n) > > + : "cc"); > > + } > > + > > + return res; > > +} > > -- > Måns Rullgård > m...@mansr.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > >
Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
Nicolas Pitre writes: > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) > +{ > + unsigned long long res; > + unsigned int tmp = 0; > + > + if (!bias) { > + asm ( "umull %Q0, %R0, %Q1, %Q2\n\t" > + "mov%Q0, #0" > + : "=&r" (res) > + : "r" (m), "r" (n) > + : "cc"); > + } else if (!(m & ((1ULL << 63) | (1ULL << 31 { > + res = m; > + asm ( "umlal %Q0, %R0, %Q1, %Q2\n\t" > + "mov%Q0, #0" > + : "+&r" (res) > + : "r" (m), "r" (n) > + : "cc"); > + } else { > + asm ( "umull %Q0, %R0, %Q2, %Q3\n\t" > + "cmn%Q0, %Q2\n\t" > + "adcs %R0, %R0, %R2\n\t" > + "adc%Q0, %1, #0" > + : "=&r" (res), "+&r" (tmp) > + : "r" (m), "r" (n) Why is tmp using a +r constraint here? The register is not written, so using an input-only operand could/should result in better code. That is also what the old code did. > + : "cc"); > + } > + > + if (!(m & ((1ULL << 63) | (1ULL << 31 { > + asm ( "umlal %R0, %Q0, %R1, %Q2\n\t" > + "umlal %R0, %Q0, %Q1, %R2\n\t" > + "mov%R0, #0\n\t" > + "umlal %Q0, %R0, %R1, %R2" > + : "+&r" (res) > + : "r" (m), "r" (n) > + : "cc"); > + } else { > + asm ( "umlal %R0, %Q0, %R2, %Q3\n\t" > + "umlal %R0, %1, %Q2, %R3\n\t" > + "mov%R0, #0\n\t" > + "adds %Q0, %1, %Q0\n\t" > + "adc%R0, %R0, #0\n\t" > + "umlal %Q0, %R0, %R2, %R3" > + : "+&r" (res), "+&r" (tmp) > + : "r" (m), "r" (n) > + : "cc"); > + } > + > + return res; > +} -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
Hi Nicolas, [auto build test WARNING on asm-generic/master] [also WARNING on: v4.3 next-20151103] url: https://github.com/0day-ci/linux/commits/Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348 base: https://github.com/0day-ci/linux Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348 config: arm-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All warnings (new ones prefixed by >>): In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from drivers/cpufreq/s5pv210-cpufreq.c:13: drivers/cpufreq/s5pv210-cpufreq.c: In function 's5pv210_set_refresh': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> drivers/cpufreq/s5pv210-cpufreq.c:215:2: note: in expansion of macro 'do_div' do_div(tmp, freq); ^ >> drivers/cpufreq/s5pv210-cpufreq.c:215:2: warning: right shift count >= width >> of type In file included from include/linux/kernel.h:136:0, from drivers/cpufreq/s5pv210-cpufreq.c:13: arch/arm/include/asm/div64.h:49:20: warning: passing argument 1 of '__div64_32' from incompatible pointer type #define __div64_32 __div64_32 ^ include/asm-generic/div64.h:235:11: note: in expansion of macro '__div64_32' __rem = __div64_32(&(n), __base); \ ^ >> drivers/cpufreq/s5pv210-cpufreq.c:215:2: note: in expansion of macro 'do_div' do_div(tmp, freq); ^ arch/arm/include/asm/div64.h:32:24: note: expected 'uint64_t *' but argument is of type 'long unsigned int *' static inline uint32_t __div64_32(uint64_t *n, uint32_t base) ^ In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from drivers/cpufreq/s5pv210-cpufreq.c:13: include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ drivers/cpufreq/s5pv210-cpufreq.c:219:2: note: in expansion of macro 'do_div' do_div(tmp1, tmp); ^ drivers/cpufreq/s5pv210-cpufreq.c:219:2: warning: right shift count >= width of type In file included from include/linux/kernel.h:136:0, from drivers/cpufreq/s5pv210-cpufreq.c:13: arch/arm/include/asm/div64.h:49:20: warning: passing argument 1 of '__div64_32' from incompatible pointer type #define __div64_32 __div64_32 ^ include/asm-generic/div64.h:235:11: note: in expansion of macro '__div64_32' __rem = __div64_32(&(n), __base); \ ^ drivers/cpufreq/s5pv210-cpufreq.c:219:2: note: in expansion of macro 'do_div' do_div(tmp1, tmp); ^ arch/arm/include/asm/div64.h:32:24: note: expected 'uint64_t *' but argument is of type 'long unsigned int *' static inline uint32_t __div64_32(uint64_t *n, uint32_t base) ^ vim +/do_div +215 drivers/cpufreq/s5pv210-cpufreq.c 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 199 { 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 200 unsigned long tmp, tmp1; 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 201 void __iomem *reg = NULL; 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 202 d62fa311 arch/arm/mach-s5pv210/cpufreq.c Jonghwan Choi 2011-05-12 203 if (ch == DMC0) { 6d4ed0f4 drivers/cpufreq/s5pv210-cpufreq.c Tomasz Figa 2014-07-03 204 reg = (dmc_base[0] + 0x30); d62fa311 arch/arm/mach-s5pv210/cpufreq.c Jonghwan Choi 2011-05-12 205 } else if (ch == DMC1) { 6d4ed0f4 drivers/cpufreq/s5pv210-cpufreq.c Tomasz Figa 2014-07-03 206 reg = (dmc_base[1] + 0x30); d62fa311 arch/arm/mach-s5pv210/cpufreq.c Jonghwan Choi 2011-05-12 207 } else { 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 208 printk(KERN_ERR "Cannot find DMC port\n"); d62fa311 arch/arm/mach-s5pv210/cpufreq.c Jonghwan Choi 2011-05-12 209 return; d62fa311 arch/arm/mach-s5pv210/cpufreq.c Jonghwan Choi 2011-05-12 210 } 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 211 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 212 /* Find current DRAM frequency */ 83efc743 arch/arm/mach-s5pv210/cpufreq.c Jaecheol Lee 2010-10-12 213 tmp = s5pv210_dram_conf[ch].freq; 83efc743 arch/arm/ma
Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
[added Mike/linux-clk and David/dri-devel] A patch I produced is now highlighting existing bugs in the drivers listed below. On Tue, 3 Nov 2015, kbuild test robot wrote: > Hi Nicolas, > > [auto build test WARNING on asm-generic/master -- if it's inappropriate base, > please suggest rules for selecting the more suitable base] > > url: > https://github.com/0day-ci/linux/commits/Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348 > config: arm-multi_v7_defconfig (attached as .config) > reproduce: > wget > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross > -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=arm > > All warnings (new ones prefixed by >>): > >In file included from arch/arm/include/asm/div64.h:126:0, > from include/linux/kernel.h:136, > from include/asm-generic/bug.h:13, > from arch/arm/include/asm/bug.h:62, > from include/linux/bug.h:4, > from include/linux/io.h:23, > from include/linux/clk-provider.h:14, > from drivers/clk/imx/clk-pllv1.c:1: >drivers/clk/imx/clk-pllv1.c: In function 'clk_pllv1_recalc_rate': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ > >> drivers/clk/imx/clk-pllv1.c:99:2: note: in expansion of macro 'do_div' > do_div(ll, mfd + 1); > ^ Here the problem is in clk-pllv1.c where the ll variable is declared as a long long. It should be an unsigned long long, or better yet an uint64_t or u64. > -- >In file included from arch/arm/include/asm/div64.h:126:0, > from include/linux/kernel.h:136, > from drivers/clk/imx/clk-pllv2.c:1: >drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_recalc_rate': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ > >> drivers/clk/imx/clk-pllv2.c:103:2: note: in expansion of macro 'do_div' > do_div(temp, mfd + 1); > ^ Same thing: temp is declared as a s64. It should be u64. >drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_set_rate': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ >drivers/clk/imx/clk-pllv2.c:145:2: note: in expansion of macro 'do_div' > do_div(temp64, quad_parent_rate / 100); > ^ Ditto here. > -- >In file included from arch/arm/include/asm/div64.h:126:0, > from include/linux/kernel.h:136, > from drivers/clk/tegra/clk-divider.c:17: >drivers/clk/tegra/clk-divider.c: In function 'get_div': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ > >> drivers/clk/tegra/clk-divider.c:50:2: note: in expansion of macro 'do_div' > do_div(divider_ux1, rate); > ^ Ditto here. > -- >In file included from arch/arm/include/asm/div64.h:126:0, > from include/linux/kernel.h:136, > from drivers/clk/ti/clkt_dpll.c:17: >drivers/clk/ti/clkt_dpll.c: In function 'omap2_get_dpll_rate': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ > >> drivers/clk/ti/clkt_dpll.c:266:2: note: in expansion of macro 'do_div' > do_div(dpll_clk, dpll_div + 1); > ^ Ditto here. > -- >In file included from arch/arm/include/asm/div64.h:126:0, > from include/linux/kernel.h:136, > from include/linux/clk.h:16, > from drivers/clk/ti/fapll.c:12: >drivers/clk/ti/fapll.c: In function 'ti_fapll_recalc_rate': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ > >> drivers/clk/ti/fapll.c:182:3: note: in expansion of macro 'do_div' > do_div(rate, fapll_p); > ^ Ditto here. >drivers/clk/ti/fapll.c: In function 'ti_fapll_synth_recalc_rate': >include/asm-generic/div64.h:217:28: warning: comparison of distinct > pointer types lacks a cast > (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ >^ >drivers/clk/ti/fapll.c:346:3: note: in expansion of macro 'do_div' > do_div(ra
Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
Hi Nicolas, [auto build test WARNING on asm-generic/master -- if it's inappropriate base, please suggest rules for selecting the more suitable base] url: https://github.com/0day-ci/linux/commits/Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348 config: arm-multi_v7_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All warnings (new ones prefixed by >>): In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from include/asm-generic/bug.h:13, from arch/arm/include/asm/bug.h:62, from include/linux/bug.h:4, from include/linux/io.h:23, from include/linux/clk-provider.h:14, from drivers/clk/imx/clk-pllv1.c:1: drivers/clk/imx/clk-pllv1.c: In function 'clk_pllv1_recalc_rate': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> drivers/clk/imx/clk-pllv1.c:99:2: note: in expansion of macro 'do_div' do_div(ll, mfd + 1); ^ -- In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from drivers/clk/imx/clk-pllv2.c:1: drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_recalc_rate': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> drivers/clk/imx/clk-pllv2.c:103:2: note: in expansion of macro 'do_div' do_div(temp, mfd + 1); ^ drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_set_rate': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ drivers/clk/imx/clk-pllv2.c:145:2: note: in expansion of macro 'do_div' do_div(temp64, quad_parent_rate / 100); ^ -- In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from drivers/clk/tegra/clk-divider.c:17: drivers/clk/tegra/clk-divider.c: In function 'get_div': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> drivers/clk/tegra/clk-divider.c:50:2: note: in expansion of macro 'do_div' do_div(divider_ux1, rate); ^ -- In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from drivers/clk/ti/clkt_dpll.c:17: drivers/clk/ti/clkt_dpll.c: In function 'omap2_get_dpll_rate': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> drivers/clk/ti/clkt_dpll.c:266:2: note: in expansion of macro 'do_div' do_div(dpll_clk, dpll_div + 1); ^ -- In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from include/linux/clk.h:16, from drivers/clk/ti/fapll.c:12: drivers/clk/ti/fapll.c: In function 'ti_fapll_recalc_rate': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> drivers/clk/ti/fapll.c:182:3: note: in expansion of macro 'do_div' do_div(rate, fapll_p); ^ drivers/clk/ti/fapll.c: In function 'ti_fapll_synth_recalc_rate': include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ drivers/clk/ti/fapll.c:346:3: note: in expansion of macro 'do_div' do_div(rate, synth_div_freq); ^ -- In file included from arch/arm/include/asm/div64.h:126:0, from include/linux/kernel.h:136, from include/linux/list.h:8, from include/linux/preempt.h:10, from include/linux/spinlock.h:50, from include/linux/mmzone.h:7, from include/linux/gfp.h:5, from include/linux/slab.h:14, from drivers/gpu/drm/nouveau/include/nvif/os.h:5, from drivers/gpu/drm/nouveau/include/nvkm/core/os.h:3, from drivers/gpu/drm/nouveau
[PATCH 5/5] ARM: asm/div64.h: adjust to generic codde
Now that the constant divisor optimization is made generic, adapt the ARM case to it. Signed-off-by: Nicolas Pitre --- arch/arm/include/asm/div64.h | 283 ++- 1 file changed, 93 insertions(+), 190 deletions(-) diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h index 662c7bd061..626bbb3671 100644 --- a/arch/arm/include/asm/div64.h +++ b/arch/arm/include/asm/div64.h @@ -5,9 +5,9 @@ #include /* - * The semantics of do_div() are: + * The semantics of __div64_32() are: * - * uint32_t do_div(uint64_t *n, uint32_t base) + * uint32_t __div64_32(uint64_t *n, uint32_t base) * { * uint32_t remainder = *n % base; * *n = *n / base; @@ -16,8 +16,9 @@ * * In other words, a 64-bit dividend with a 32-bit divisor producing * a 64-bit result and a 32-bit remainder. To accomplish this optimally - * we call a special __do_div64 helper with completely non standard - * calling convention for arguments and results (beware). + * we override the generic version in lib/div64.c to call our __do_div64 + * assembly implementation with completely non standard calling convention + * for arguments and results (beware). */ #ifdef __ARMEB__ @@ -28,199 +29,101 @@ #define __xh "r1" #endif -#define __do_div_asm(n, base) \ -({ \ - register unsigned int __base asm("r4") = base; \ - register unsigned long long __n asm("r0") = n;\ - register unsigned long long __res asm("r2");\ - register unsigned int __rem asm(__xh);\ - asm(__asmeq("%0", __xh) \ - __asmeq("%1", "r2") \ - __asmeq("%2", "r0") \ - __asmeq("%3", "r4") \ - "bl __do_div64" \ - : "=r" (__rem), "=r" (__res)\ - : "r" (__n), "r" (__base) \ - : "ip", "lr", "cc");\ - n = __res; \ - __rem; \ -}) - -#if __GNUC__ < 4 || !defined(CONFIG_AEABI) +static inline uint32_t __div64_32(uint64_t *n, uint32_t base) +{ + register unsigned int __base asm("r4") = base; + register unsigned long long __n asm("r0") = *n; + register unsigned long long __res asm("r2"); + register unsigned int __rem asm(__xh); + asm(__asmeq("%0", __xh) + __asmeq("%1", "r2") + __asmeq("%2", "r0") + __asmeq("%3", "r4") + "bl __do_div64" + : "=r" (__rem), "=r" (__res) + : "r" (__n), "r" (__base) + : "ip", "lr", "cc"); + *n = __res; + return __rem; +} +#define __div64_32 __div64_32 + +#if !defined(CONFIG_AEABI) /* - * gcc versions earlier than 4.0 are simply too problematic for the - * optimized implementation below. First there is gcc PR 15089 that - * tend to trig on more complex constructs, spurious .global __udivsi3 - * are inserted even if none of those symbols are referenced in the - * generated code, and those gcc versions are not able to do constant - * propagation on long long values anyway. + * In OABI configurations, some uses of the do_div function + * cause gcc to run out of registers. To work around that, + * we can force the use of the out-of-line version for + * configurations that build a OABI kernel. */ -#define do_div(n, base) __do_div_asm(n, base) - -#elif __GNUC__ >= 4 +#define do_div(n, base) __div64_32(&(n), base) -#include +#else /* - * If the divisor happens to be constant, we determine the appropriate - * inverse at compile time to turn the division into a few inline - * multiplications instead which is much faster. And yet only if compiling - * for ARMv4 or higher (we need umull/umlal) and if the gcc version is - * sufficiently recent to perform proper long long constant propagation. - * (It is unfortunate that gcc doesn't perform all this internally.) + * gcc versions earlier than 4.0 are simply too problematic for the + * __div64_const32() code in asm-generic/div64.h. First there is + * gcc PR 15089 that tend to trig on more complex constructs, spurious + * .global __udivsi3 are inserted even if none of those symbols are + * referenced in the generated code, and those gcc versions are not able + * to do constant propagation on long long values anyway. */ -#define do_div(n, base) \ -({ \ - unsigned int __r, __b = (base); \ - if (!__builtin_constant_p(__b) || __b == 0 ||