On Wed, 27 Apr 2016 16:05:50 +0800 zengzhao...@163.com wrote:

> From: Zhaoxiu Zeng <zhaoxiu.z...@gmail.com>
> 
> Because some architectures (alpha, armv6, etc.) don't provide hardware 
> division,
> the mod operation is slow! Binary GCD algorithm uses simple arithmetic 
> operations,
> it replaces division with arithmetic shifts, comparisons, and subtraction.
> 
> I have compiled successfully with x86_64_defconfig and i386_defconfig.
> 
> I use the following code to test:
> 
> ...
>
> Compiled with "-O2", on "VirtualBox 4.2.0-35-generic #40-Ubuntu x86_64" got:
> 
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 92281
> gcd1: elapsed 55005
> gcd2: elapsed 91088
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 115546
> gcd1: elapsed 55928
> gcd2: elapsed 91938
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 91189
> gcd1: elapsed 55493
> gcd2: elapsed 90078
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 157364
> gcd1: elapsed 55204
> gcd2: elapsed 90058
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 91667
> gcd1: elapsed 54641
> gcd2: elapsed 91364

Can you please include a summary which describes the overall impact of
the patch?  eg, how faster does a%b become on <some architecture>

> --- a/lib/gcd.c
> +++ b/lib/gcd.c
> @@ -2,19 +2,82 @@
>  #include <linux/gcd.h>
>  #include <linux/export.h>
>  
> -/* Greatest common divisor */
> +/*
> + * use __ffs if the CPU has efficient __ffs
> + */
> +#if (defined(CONFIG_ALPHA) && defined(CONFIG_ALPHA_EV6) && 
> defined(CONFIG_ALPHA_EV67)) || \
> +     defined(CONFIG_ARC) || \
> +     (defined(CONFIG_ARM) && __LINUX_ARM_ARCH__ >= 5) || 
> defined(CONFIG_ARM64) || \
> +     defined(CONFIG_AVR32) || \
> +     defined(CONFIG_BLACKFIN) || \
> +     defined(CONFIG_C6X) || \
> +     defined(CONFIG_CRIS) || \
> +     defined(CONFIG_FRV) || \
> +     defined(CONFIG_HEXAGON) || \
> +     defined(CONFIG_IA64) || \
> +     (defined(CONFIG_M68K) && \
> +      (!defined(CONFIG_CPU_HAS_NO_BITFIELDS) || \
> +       ((defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
> +        !defined(CONFIG_M68000) && !defined(CONFIG_MCPU32)))) || \
> +     defined(CONFIG_MN10300) || \
> +     defined(CONFIG_OPENRISC) || \
> +     defined(CONFIG_POWERPC) || \
> +     defined(CONFIG_S390) || \
> +     defined(CONFIG_TILE) || \
> +     defined(CONFIG_UNICORE32) || \
> +     defined(CONFIG_X86) || \
> +     defined(CONFIG_XTENSA)
> +# define USE_FFS 1
> +#elif defined(CONFIG_MIPS)
> +# define USE_FFS (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz)
> +#else
> +# define USE_FFS 0
> +#endif

Yikes.

Normally we'd create a new Kconfig variable and do this in the
individual arch/XXX/Kconfig files.

Can we just assume CONFIG_ARCH_HAS_SLOW_FFS is false and then set it to
true for MIPS, arm, etc?

> +/*
> + * This implements the binary GCD algorithm. (Often attributed to Stein,
> + * but as Knith has noted, appears a first-century Chinese math text.)
> + */

Knith might be Knuth?

>  unsigned long gcd(unsigned long a, unsigned long b)
>  {
> -     unsigned long r;
> +     unsigned long r = a | b;
> +
> +     if (!a || !b)
> +             return r;

Reply via email to