Re: A simple way to do compile time loop unrolling

2013-05-31 Thread finalpatch

Wow! That's so very cool! We can make it even nicer with

template Unroll(alias CODE, alias N, alias SEP="")
{
enum t = replace(CODE, "%", "%1$d");
enum Unroll = iota(N).map!(i => format(t, i)).join(SEP);
}

And use % as the placeholder instead of the ugly %1$d:

mixin(Unroll!("v1[%]*v2[%]", 3, "+"));

It actually gets quite readable now.

On Friday, 31 May 2013 at 17:30:13 UTC, Peter Alexander wrote:
Remember that in D, most side-effect free functions can be run 
at compile time. No need for recursive template trickery:


mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", 
i)).join());




Re: A simple way to do compile time loop unrolling

2013-05-31 Thread Nick Sabalausky
On Fri, 31 May 2013 19:30:10 +0200
"Peter Alexander"  wrote:
>
> 
> mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", 
> i)).join());

Dayamn! I knew CTFE had improved considerably over the last year or
so, but even I didn't expect something like that to be working already.
That's crazy! :)



Re: A simple way to do compile time loop unrolling

2013-05-31 Thread Peter Alexander

On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:
Just want to share a new way I just discovered to do loop 
unrolling.


template Unroll(alias CODE, alias N)
{
static if (N == 1)
enum Unroll = format(CODE, 0);
else
enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}

after that you can write stuff like

mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

and it gets expanded to

v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

I find this method simpler than with foreach() and a tuple 
range, and also faster because it's identical to hand unrolling.


Remember that in D, most side-effect free functions can be run at 
compile time. No need for recursive template trickery:


mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", 
i)).join());


Re: A simple way to do compile time loop unrolling

2013-05-31 Thread Marco Leise
Am Fri, 31 May 2013 16:33:19 +0200
schrieb Piotr Szturmaj :

> It is also an opportunity to do loop vectorization. But I 
> doubt that either is available in DMD, not sure about GDC and LDC.

GDC once vectorized something for me, where I used a struct of
4 ubyte fields. I don't remember if it was a loop at all. I
think all I did was operate on 3 of the fields in sequence
applying the same operations and the compiler loaded the whole
struct into an SSE register and it really payed off speed wise!

But when you think about it, working with RGB or XYZW vectors
is a common task in programming, so I can see why they put so
much work into vectorization.
The caveat is just that you have to remember to add a fourth
dummy field to XYZ or RGB.

-- 
Marco



Re: A simple way to do compile time loop unrolling

2013-05-31 Thread bearophile

Andrei Alexandrescu:


We should have something like that in phobos.


Better (some part of static foreach):
http://d.puremagic.com/issues/show_bug.cgi?id=4085

Bye,
bearophile


Re: A simple way to do compile time loop unrolling

2013-05-31 Thread Andrei Alexandrescu

On 5/31/13 10:06 AM, finalpatch wrote:

Just want to share a new way I just discovered to do loop unrolling.

template Unroll(alias CODE, alias N)
{
static if (N == 1)
enum Unroll = format(CODE, 0);
else
enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}

after that you can write stuff like

mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

and it gets expanded to

v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

I find this method simpler than with foreach() and a tuple range, and
also faster because it's identical to hand unrolling.


Hehe, first shot is always a trip isn't it. Welcome aboard.

We should have something like that in phobos.


Andrei


Re: A simple way to do compile time loop unrolling

2013-05-31 Thread Piotr Szturmaj

W dniu 31.05.2013 16:06, finalpatch pisze:

Just want to share a new way I just discovered to do loop unrolling.

template Unroll(alias CODE, alias N)
{
 static if (N == 1)
 enum Unroll = format(CODE, 0);
 else
 enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}

after that you can write stuff like

mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

and it gets expanded to

v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

I find this method simpler than with foreach() and a tuple range, and
also faster because it's identical to hand unrolling.


The advantage of foreach unrolling is that compiler can optimally choose 
unrolling depth as different depths may be faster or slower on different 
CPU targets. It is also an opportunity to do loop vectorization. But I 
doubt that either is available in DMD, not sure about GDC and LDC.


Re: A simple way to do compile time loop unrolling

2013-05-31 Thread finalpatch

Minor improvement:

template Unroll(alias CODE, alias N, alias SEP="")
{
static if (N == 1)
enum Unroll = format(CODE, 0);
else
enum Unroll = Unroll!(CODE, N-1, SEP)~SEP~format(CODE, 
N-1);

}

So vector dot product can be unrolled like this:

mixin(Unroll!("v1[%1$d]*v2[%1$d]", 3, "+"));

which becomes: v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2]

On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:
Just want to share a new way I just discovered to do loop 
unrolling.


template Unroll(alias CODE, alias N)
{
static if (N == 1)
enum Unroll = format(CODE, 0);
else
enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}

after that you can write stuff like

mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

and it gets expanded to

v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

I find this method simpler than with foreach() and a tuple 
range, and also faster because it's identical to hand unrolling.




A simple way to do compile time loop unrolling

2013-05-31 Thread finalpatch
Just want to share a new way I just discovered to do loop 
unrolling.


template Unroll(alias CODE, alias N)
{
static if (N == 1)
enum Unroll = format(CODE, 0);
else
enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1);
}

after that you can write stuff like

mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3));

and it gets expanded to

v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2];

I find this method simpler than with foreach() and a tuple range, 
and also faster because it's identical to hand unrolling.