On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg <aksgarg1...@gmail.com> wrote:
>
> I have done profiling of int_sqrt function using perf tool for 10 times.
> For this purpose i have created a userspace program which uses sqrt function
> from 1 to a million.

Hmm. I did that too, and it doesn't improve things for me. In fact, it
makes it slower.

    [torvalds@i7 ~]$ gcc -Wall -O2 -DREDUCE int_sqrt.c ; time ./a.out

    real 0m2.098s
    user 0m2.095s
    sys 0m0.000s

    [torvalds@i7 ~]$ gcc -Wall -O2 int_sqrt.c ; time ./a.out

    real 0m1.886s
    user 0m1.883s
    sys 0m0.000s

and the profile shows that 35% of the time is spent on that branch
back of the initial reduction loop.

In contrast, my suggested "reduce just once" does seem to improve things:

    [torvalds@i7 ~]$ gcc -Wall -O2 -DONCE int_sqrt.c ; time ./a.out

    real 0m1.436s
    user 0m1.434s
    sys 0m0.000s

but it's kind of hacky.

NOTE! This probably depends a lot on microarchitecture details,
including very much branch predictor etc. And I didn't actually check
that it gives the right result, but I do think that this optimization
needs to be looked at more if we want to do it.

I was running this on an i7-4770S, fwiw.

Attached is the stupid test-program I used to do the above. Maybe I
did something wrong.

                         Linus
/*
 * Copyright (C) 2013 Davidlohr Bueso <davidlohr.bu...@hp.com>
 *
 *  Based on the shift-and-subtract algorithm for computing integer
 *  square root from Guy L. Steele.
 */

#define BITS_PER_LONG (8*sizeof(long))

/**
 * int_sqrt - rough approximation to sqrt
 * @x: integer of which to calculate the sqrt
 *
 * A very rough approximation to the sqrt() function.
 */
unsigned long __attribute__((noinline)) int_sqrt(unsigned long x)
{
	unsigned long b, m, y = 0;

	if (x <= 1)
		return x;

	m = 1UL << (BITS_PER_LONG - 2);

#ifdef REDUCE
	while (m > x)
		m >>= 2;
#elif defined(ONCE)
	{
		unsigned long n = m >> (BITS_PER_LONG/2);
		m = (n >= x) ? n : m;
	}
#endif
	while (m != 0) {
		b = y + m;
		y >>= 1;

		if (x >= b) {
			x -= b;
			y += m;
		}
		m >>= 2;
	}

	return y;
}

int main(int argc, char **argv)
{
	unsigned long i;

	for (i = 0; i < 100000000; i++) {
		unsigned long a = int_sqrt(i);
		asm volatile("": : "r" (a));
	}
	return 0;
}

Reply via email to