Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Thu, 19 Nov 2015, Måns Rullgård wrote:
>
>> Nicolas Pitre  writes:
>> 
>> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
>> > +{
>> > +  unsigned long long res;
>> > +  unsigned int tmp = 0;
>> > +
>> > +  if (!bias) {
>> > +  asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
>> > +  "mov%Q0, #0"
>> > +  : "=&r" (res)
>> > +  : "r" (m), "r" (n)
>> > +  : "cc");
>> > +  } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
>> > +  res = m;
>> > +  asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
>> > +  "mov%Q0, #0"
>> > +  : "+&r" (res)
>> > +  : "r" (m), "r" (n)
>> > +  : "cc");
>> > +  } else {
>> > +  asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
>> > +  "cmn%Q0, %Q2\n\t"
>> > +  "adcs   %R0, %R0, %R2\n\t"
>> > +  "adc%Q0, %1, #0"
>> > +  : "=&r" (res), "+&r" (tmp)
>> > +  : "r" (m), "r" (n)
>> 
>> Why is tmp using a +r constraint here?  The register is not written, so
>> using an input-only operand could/should result in better code.  That is
>> also what the old code did.
>
> No, it is worse. gcc allocates two registers because, somehow, it 
> doesn't think that the first one still holds zero after the first usage.  
> This way usage of only one temporary register is forced throughout, 
> producing better code.

Makes sense.  Thanks for explaining.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Nicolas Pitre
On Thu, 19 Nov 2015, Måns Rullgård wrote:

> Nicolas Pitre  writes:
> 
> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
> > +{
> > +   unsigned long long res;
> > +   unsigned int tmp = 0;
> > +
> > +   if (!bias) {
> > +   asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
> > +   "mov%Q0, #0"
> > +   : "=&r" (res)
> > +   : "r" (m), "r" (n)
> > +   : "cc");
> > +   } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
> > +   res = m;
> > +   asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
> > +   "mov%Q0, #0"
> > +   : "+&r" (res)
> > +   : "r" (m), "r" (n)
> > +   : "cc");
> > +   } else {
> > +   asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
> > +   "cmn%Q0, %Q2\n\t"
> > +   "adcs   %R0, %R0, %R2\n\t"
> > +   "adc%Q0, %1, #0"
> > +   : "=&r" (res), "+&r" (tmp)
> > +   : "r" (m), "r" (n)
> 
> Why is tmp using a +r constraint here?  The register is not written, so
> using an input-only operand could/should result in better code.  That is
> also what the old code did.

No, it is worse. gcc allocates two registers because, somehow, it 
doesn't think that the first one still holds zero after the first usage.  
This way usage of only one temporary register is forced throughout, 
producing better code.

I meant to have this split out in a separate patch but messed it up 
somehow.



> 
> > +   : "cc");
> > +   }
> > +
> > +   if (!(m & ((1ULL << 63) | (1ULL << 31 {
> > +   asm (   "umlal  %R0, %Q0, %R1, %Q2\n\t"
> > +   "umlal  %R0, %Q0, %Q1, %R2\n\t"
> > +   "mov%R0, #0\n\t"
> > +   "umlal  %Q0, %R0, %R1, %R2"
> > +   : "+&r" (res)
> > +   : "r" (m), "r" (n)
> > +   : "cc");
> > +   } else {
> > +   asm (   "umlal  %R0, %Q0, %R2, %Q3\n\t"
> > +   "umlal  %R0, %1, %Q2, %R3\n\t"
> > +   "mov%R0, #0\n\t"
> > +   "adds   %Q0, %1, %Q0\n\t"
> > +   "adc%R0, %R0, #0\n\t"
> > +   "umlal  %Q0, %R0, %R2, %R3"
> > +   : "+&r" (res), "+&r" (tmp)
> > +   : "r" (m), "r" (n)
> > +   : "cc");
> > +   }
> > +
> > +   return res;
> > +}
> 
> -- 
> Måns Rullgård
> m...@mansr.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Måns Rullgård
Nicolas Pitre  writes:

> +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
> +{
> + unsigned long long res;
> + unsigned int tmp = 0;
> +
> + if (!bias) {
> + asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
> + "mov%Q0, #0"
> + : "=&r" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
> + res = m;
> + asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
> + "mov%Q0, #0"
> + : "+&r" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else {
> + asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
> + "cmn%Q0, %Q2\n\t"
> + "adcs   %R0, %R0, %R2\n\t"
> + "adc%Q0, %1, #0"
> + : "=&r" (res), "+&r" (tmp)
> + : "r" (m), "r" (n)

Why is tmp using a +r constraint here?  The register is not written, so
using an input-only operand could/should result in better code.  That is
also what the old code did.

> + : "cc");
> + }
> +
> + if (!(m & ((1ULL << 63) | (1ULL << 31 {
> + asm (   "umlal  %R0, %Q0, %R1, %Q2\n\t"
> + "umlal  %R0, %Q0, %Q1, %R2\n\t"
> + "mov%R0, #0\n\t"
> + "umlal  %Q0, %R0, %R1, %R2"
> + : "+&r" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else {
> + asm (   "umlal  %R0, %Q0, %R2, %Q3\n\t"
> + "umlal  %R0, %1, %Q2, %R3\n\t"
> + "mov%R0, #0\n\t"
> + "adds   %Q0, %1, %Q0\n\t"
> + "adc%R0, %R0, #0\n\t"
> + "umlal  %Q0, %R0, %R2, %R3"
> + : "+&r" (res), "+&r" (tmp)
> + : "r" (m), "r" (n)
> + : "cc");
> + }
> +
> + return res;
> +}

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-03 Thread kbuild test robot
Hi Nicolas,

[auto build test WARNING on asm-generic/master]
[also WARNING on: v4.3 next-20151103]

url:
https://github.com/0day-ci/linux/commits/Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348
base:   https://github.com/0day-ci/linux 
Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348
config: arm-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from drivers/cpufreq/s5pv210-cpufreq.c:13:
   drivers/cpufreq/s5pv210-cpufreq.c: In function 's5pv210_set_refresh':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> drivers/cpufreq/s5pv210-cpufreq.c:215:2: note: in expansion of macro 'do_div'
 do_div(tmp, freq);
 ^
>> drivers/cpufreq/s5pv210-cpufreq.c:215:2: warning: right shift count >= width 
>> of type
   In file included from include/linux/kernel.h:136:0,
from drivers/cpufreq/s5pv210-cpufreq.c:13:
   arch/arm/include/asm/div64.h:49:20: warning: passing argument 1 of 
'__div64_32' from incompatible pointer type
#define __div64_32 __div64_32
   ^
   include/asm-generic/div64.h:235:11: note: in expansion of macro '__div64_32'
  __rem = __div64_32(&(n), __base); \
  ^
>> drivers/cpufreq/s5pv210-cpufreq.c:215:2: note: in expansion of macro 'do_div'
 do_div(tmp, freq);
 ^
   arch/arm/include/asm/div64.h:32:24: note: expected 'uint64_t *' but argument 
is of type 'long unsigned int *'
static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
   ^
   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from drivers/cpufreq/s5pv210-cpufreq.c:13:
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
   drivers/cpufreq/s5pv210-cpufreq.c:219:2: note: in expansion of macro 'do_div'
 do_div(tmp1, tmp);
 ^
   drivers/cpufreq/s5pv210-cpufreq.c:219:2: warning: right shift count >= width 
of type
   In file included from include/linux/kernel.h:136:0,
from drivers/cpufreq/s5pv210-cpufreq.c:13:
   arch/arm/include/asm/div64.h:49:20: warning: passing argument 1 of 
'__div64_32' from incompatible pointer type
#define __div64_32 __div64_32
   ^
   include/asm-generic/div64.h:235:11: note: in expansion of macro '__div64_32'
  __rem = __div64_32(&(n), __base); \
  ^
   drivers/cpufreq/s5pv210-cpufreq.c:219:2: note: in expansion of macro 'do_div'
 do_div(tmp1, tmp);
 ^
   arch/arm/include/asm/div64.h:32:24: note: expected 'uint64_t *' but argument 
is of type 'long unsigned int *'
static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
   ^

vim +/do_div +215 drivers/cpufreq/s5pv210-cpufreq.c

83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  199  {
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  200
unsigned long tmp, tmp1;
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  201
void __iomem *reg = NULL;
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  202  
d62fa311 arch/arm/mach-s5pv210/cpufreq.c   Jonghwan Choi 2011-05-12  203
if (ch == DMC0) {
6d4ed0f4 drivers/cpufreq/s5pv210-cpufreq.c Tomasz Figa   2014-07-03  204
reg = (dmc_base[0] + 0x30);
d62fa311 arch/arm/mach-s5pv210/cpufreq.c   Jonghwan Choi 2011-05-12  205
} else if (ch == DMC1) {
6d4ed0f4 drivers/cpufreq/s5pv210-cpufreq.c Tomasz Figa   2014-07-03  206
reg = (dmc_base[1] + 0x30);
d62fa311 arch/arm/mach-s5pv210/cpufreq.c   Jonghwan Choi 2011-05-12  207
} else {
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  208
printk(KERN_ERR "Cannot find DMC port\n");
d62fa311 arch/arm/mach-s5pv210/cpufreq.c   Jonghwan Choi 2011-05-12  209
return;
d62fa311 arch/arm/mach-s5pv210/cpufreq.c   Jonghwan Choi 2011-05-12  210
}
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  211  
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  212
/* Find current DRAM frequency */
83efc743 arch/arm/mach-s5pv210/cpufreq.c   Jaecheol Lee  2010-10-12  213
tmp = s5pv210_dram_conf[ch].freq;
83efc743 arch/arm/ma

Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-02 Thread Nicolas Pitre
[added Mike/linux-clk and David/dri-devel]

A patch I produced is now highlighting existing bugs in the drivers 
listed below.

On Tue, 3 Nov 2015, kbuild test robot wrote:

> Hi Nicolas,
> 
> [auto build test WARNING on asm-generic/master -- if it's inappropriate base, 
> please suggest rules for selecting the more suitable base]
> 
> url:
> https://github.com/0day-ci/linux/commits/Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348
> config: arm-multi_v7_defconfig (attached as .config)
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm 
> 
> All warnings (new ones prefixed by >>):
> 
>In file included from arch/arm/include/asm/div64.h:126:0,
> from include/linux/kernel.h:136,
> from include/asm-generic/bug.h:13,
> from arch/arm/include/asm/bug.h:62,
> from include/linux/bug.h:4,
> from include/linux/io.h:23,
> from include/linux/clk-provider.h:14,
> from drivers/clk/imx/clk-pllv1.c:1:
>drivers/clk/imx/clk-pllv1.c: In function 'clk_pllv1_recalc_rate':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
> >> drivers/clk/imx/clk-pllv1.c:99:2: note: in expansion of macro 'do_div'
>  do_div(ll, mfd + 1);
>  ^

Here the problem is in clk-pllv1.c where the ll variable is declared as 
a long long. It should be an unsigned long long, or better yet an 
uint64_t or u64.

> --
>In file included from arch/arm/include/asm/div64.h:126:0,
> from include/linux/kernel.h:136,
> from drivers/clk/imx/clk-pllv2.c:1:
>drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_recalc_rate':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
> >> drivers/clk/imx/clk-pllv2.c:103:2: note: in expansion of macro 'do_div'
>  do_div(temp, mfd + 1);
>  ^

Same thing: temp is declared as a s64. It should be u64.

>drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_set_rate':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
>drivers/clk/imx/clk-pllv2.c:145:2: note: in expansion of macro 'do_div'
>  do_div(temp64, quad_parent_rate / 100);
>  ^

Ditto here.

> --
>In file included from arch/arm/include/asm/div64.h:126:0,
> from include/linux/kernel.h:136,
> from drivers/clk/tegra/clk-divider.c:17:
>drivers/clk/tegra/clk-divider.c: In function 'get_div':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
> >> drivers/clk/tegra/clk-divider.c:50:2: note: in expansion of macro 'do_div'
>  do_div(divider_ux1, rate);
>  ^

Ditto here.

> --
>In file included from arch/arm/include/asm/div64.h:126:0,
> from include/linux/kernel.h:136,
> from drivers/clk/ti/clkt_dpll.c:17:
>drivers/clk/ti/clkt_dpll.c: In function 'omap2_get_dpll_rate':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
> >> drivers/clk/ti/clkt_dpll.c:266:2: note: in expansion of macro 'do_div'
>  do_div(dpll_clk, dpll_div + 1);
>  ^

Ditto here.

> --
>In file included from arch/arm/include/asm/div64.h:126:0,
> from include/linux/kernel.h:136,
> from include/linux/clk.h:16,
> from drivers/clk/ti/fapll.c:12:
>drivers/clk/ti/fapll.c: In function 'ti_fapll_recalc_rate':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
> >> drivers/clk/ti/fapll.c:182:3: note: in expansion of macro 'do_div'
>   do_div(rate, fapll_p);
>   ^

Ditto here.

>drivers/clk/ti/fapll.c: In function 'ti_fapll_synth_recalc_rate':
>include/asm-generic/div64.h:217:28: warning: comparison of distinct 
> pointer types lacks a cast
>  (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
>^
>drivers/clk/ti/fapll.c:346:3: note: in expansion of macro 'do_div'
>   do_div(ra

Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-02 Thread kbuild test robot
Hi Nicolas,

[auto build test WARNING on asm-generic/master -- if it's inappropriate base, 
please suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Nicolas-Pitre/div64-h-optimize-do_div-for-power-of-two-constant-divisors/20151103-065348
config: arm-multi_v7_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from include/asm-generic/bug.h:13,
from arch/arm/include/asm/bug.h:62,
from include/linux/bug.h:4,
from include/linux/io.h:23,
from include/linux/clk-provider.h:14,
from drivers/clk/imx/clk-pllv1.c:1:
   drivers/clk/imx/clk-pllv1.c: In function 'clk_pllv1_recalc_rate':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> drivers/clk/imx/clk-pllv1.c:99:2: note: in expansion of macro 'do_div'
 do_div(ll, mfd + 1);
 ^
--
   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from drivers/clk/imx/clk-pllv2.c:1:
   drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_recalc_rate':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> drivers/clk/imx/clk-pllv2.c:103:2: note: in expansion of macro 'do_div'
 do_div(temp, mfd + 1);
 ^
   drivers/clk/imx/clk-pllv2.c: In function '__clk_pllv2_set_rate':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
   drivers/clk/imx/clk-pllv2.c:145:2: note: in expansion of macro 'do_div'
 do_div(temp64, quad_parent_rate / 100);
 ^
--
   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from drivers/clk/tegra/clk-divider.c:17:
   drivers/clk/tegra/clk-divider.c: In function 'get_div':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> drivers/clk/tegra/clk-divider.c:50:2: note: in expansion of macro 'do_div'
 do_div(divider_ux1, rate);
 ^
--
   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from drivers/clk/ti/clkt_dpll.c:17:
   drivers/clk/ti/clkt_dpll.c: In function 'omap2_get_dpll_rate':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> drivers/clk/ti/clkt_dpll.c:266:2: note: in expansion of macro 'do_div'
 do_div(dpll_clk, dpll_div + 1);
 ^
--
   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from include/linux/clk.h:16,
from drivers/clk/ti/fapll.c:12:
   drivers/clk/ti/fapll.c: In function 'ti_fapll_recalc_rate':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> drivers/clk/ti/fapll.c:182:3: note: in expansion of macro 'do_div'
  do_div(rate, fapll_p);
  ^
   drivers/clk/ti/fapll.c: In function 'ti_fapll_synth_recalc_rate':
   include/asm-generic/div64.h:217:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
   drivers/clk/ti/fapll.c:346:3: note: in expansion of macro 'do_div'
  do_div(rate, synth_div_freq);
  ^
--
   In file included from arch/arm/include/asm/div64.h:126:0,
from include/linux/kernel.h:136,
from include/linux/list.h:8,
from include/linux/preempt.h:10,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/slab.h:14,
from drivers/gpu/drm/nouveau/include/nvif/os.h:5,
from drivers/gpu/drm/nouveau/include/nvkm/core/os.h:3,
from drivers/gpu/drm/nouveau

[PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-02 Thread Nicolas Pitre
Now that the constant divisor optimization is made generic, adapt the
ARM case to it.

Signed-off-by: Nicolas Pitre 
---
 arch/arm/include/asm/div64.h | 283 ++-
 1 file changed, 93 insertions(+), 190 deletions(-)

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 662c7bd061..626bbb3671 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -5,9 +5,9 @@
 #include 
 
 /*
- * The semantics of do_div() are:
+ * The semantics of __div64_32() are:
  *
- * uint32_t do_div(uint64_t *n, uint32_t base)
+ * uint32_t __div64_32(uint64_t *n, uint32_t base)
  * {
  * uint32_t remainder = *n % base;
  * *n = *n / base;
@@ -16,8 +16,9 @@
  *
  * In other words, a 64-bit dividend with a 32-bit divisor producing
  * a 64-bit result and a 32-bit remainder.  To accomplish this optimally
- * we call a special __do_div64 helper with completely non standard
- * calling convention for arguments and results (beware).
+ * we override the generic version in lib/div64.c to call our __do_div64
+ * assembly implementation with completely non standard calling convention
+ * for arguments and results (beware).
  */
 
 #ifdef __ARMEB__
@@ -28,199 +29,101 @@
 #define __xh "r1"
 #endif
 
-#define __do_div_asm(n, base)  \
-({ \
-   register unsigned int __base  asm("r4") = base; \
-   register unsigned long long __n   asm("r0") = n;\
-   register unsigned long long __res asm("r2");\
-   register unsigned int __rem   asm(__xh);\
-   asm(__asmeq("%0", __xh) \
-   __asmeq("%1", "r2") \
-   __asmeq("%2", "r0") \
-   __asmeq("%3", "r4") \
-   "bl __do_div64" \
-   : "=r" (__rem), "=r" (__res)\
-   : "r" (__n), "r" (__base)   \
-   : "ip", "lr", "cc");\
-   n = __res;  \
-   __rem;  \
-})
-
-#if __GNUC__ < 4 || !defined(CONFIG_AEABI)
+static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
+{
+   register unsigned int __base  asm("r4") = base;
+   register unsigned long long __n   asm("r0") = *n;
+   register unsigned long long __res asm("r2");
+   register unsigned int __rem   asm(__xh);
+   asm(__asmeq("%0", __xh)
+   __asmeq("%1", "r2")
+   __asmeq("%2", "r0")
+   __asmeq("%3", "r4")
+   "bl __do_div64"
+   : "=r" (__rem), "=r" (__res)
+   : "r" (__n), "r" (__base)
+   : "ip", "lr", "cc");
+   *n = __res;
+   return __rem;
+}
+#define __div64_32 __div64_32
+
+#if !defined(CONFIG_AEABI)
 
 /*
- * gcc versions earlier than 4.0 are simply too problematic for the
- * optimized implementation below. First there is gcc PR 15089 that
- * tend to trig on more complex constructs, spurious .global __udivsi3
- * are inserted even if none of those symbols are referenced in the
- * generated code, and those gcc versions are not able to do constant
- * propagation on long long values anyway.
+ * In OABI configurations, some uses of the do_div function
+ * cause gcc to run out of registers. To work around that,
+ * we can force the use of the out-of-line version for
+ * configurations that build a OABI kernel.
  */
-#define do_div(n, base) __do_div_asm(n, base)
-
-#elif __GNUC__ >= 4
+#define do_div(n, base) __div64_32(&(n), base)
 
-#include 
+#else
 
 /*
- * If the divisor happens to be constant, we determine the appropriate
- * inverse at compile time to turn the division into a few inline
- * multiplications instead which is much faster. And yet only if compiling
- * for ARMv4 or higher (we need umull/umlal) and if the gcc version is
- * sufficiently recent to perform proper long long constant propagation.
- * (It is unfortunate that gcc doesn't perform all this internally.)
+ * gcc versions earlier than 4.0 are simply too problematic for the
+ * __div64_const32() code in asm-generic/div64.h. First there is
+ * gcc PR 15089 that tend to trig on more complex constructs, spurious
+ * .global __udivsi3 are inserted even if none of those symbols are
+ * referenced in the generated code, and those gcc versions are not able
+ * to do constant propagation on long long values anyway.
  */
-#define do_div(n, base)
\
-({ \
-   unsigned int __r, __b = (base); \
-   if (!__builtin_constant_p(__b) || __b == 0 ||