On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg <aksgarg1...@gmail.com> wrote: > > I have done profiling of int_sqrt function using perf tool for 10 times. > For this purpose i have created a userspace program which uses sqrt function > from 1 to a million.
Hmm. I did that too, and it doesn't improve things for me. In fact, it makes it slower. [torvalds@i7 ~]$ gcc -Wall -O2 -DREDUCE int_sqrt.c ; time ./a.out real 0m2.098s user 0m2.095s sys 0m0.000s [torvalds@i7 ~]$ gcc -Wall -O2 int_sqrt.c ; time ./a.out real 0m1.886s user 0m1.883s sys 0m0.000s and the profile shows that 35% of the time is spent on that branch back of the initial reduction loop. In contrast, my suggested "reduce just once" does seem to improve things: [torvalds@i7 ~]$ gcc -Wall -O2 -DONCE int_sqrt.c ; time ./a.out real 0m1.436s user 0m1.434s sys 0m0.000s but it's kind of hacky. NOTE! This probably depends a lot on microarchitecture details, including very much branch predictor etc. And I didn't actually check that it gives the right result, but I do think that this optimization needs to be looked at more if we want to do it. I was running this on an i7-4770S, fwiw. Attached is the stupid test-program I used to do the above. Maybe I did something wrong. Linus
/* * Copyright (C) 2013 Davidlohr Bueso <davidlohr.bu...@hp.com> * * Based on the shift-and-subtract algorithm for computing integer * square root from Guy L. Steele. */ #define BITS_PER_LONG (8*sizeof(long)) /** * int_sqrt - rough approximation to sqrt * @x: integer of which to calculate the sqrt * * A very rough approximation to the sqrt() function. */ unsigned long __attribute__((noinline)) int_sqrt(unsigned long x) { unsigned long b, m, y = 0; if (x <= 1) return x; m = 1UL << (BITS_PER_LONG - 2); #ifdef REDUCE while (m > x) m >>= 2; #elif defined(ONCE) { unsigned long n = m >> (BITS_PER_LONG/2); m = (n >= x) ? n : m; } #endif while (m != 0) { b = y + m; y >>= 1; if (x >= b) { x -= b; y += m; } m >>= 2; } return y; } int main(int argc, char **argv) { unsigned long i; for (i = 0; i < 100000000; i++) { unsigned long a = int_sqrt(i); asm volatile("": : "r" (a)); } return 0; }