Efficient idiom for fastest code

2018-05-22 Thread IntegratedDimensions via Digitalmars-d-learn


Many times in expensive loops one must make decisions. Decisions 
must be determined and the determination costs.


for(int i = 0; i < N; i++)
{
if(decision(i)) A; else B;
}

the if statement costs N times the cycle cost.

In some cases the decision holds for continuous ranges. For some 
0 <= n <= N the decision is constant, but n is 
arbitrary(determined by unknown factors at compile time).


One can speed up the routine by using something akin to a 
simplified strategy pattern where one uses 
functions/delegates/lambdas to code a faster version without the 
test:



for(int i = 0; i < N; i++)
{
d = (){ if(decision(i)) A; else d = () { B; };
d();
}

this code basically reduces to

for(int i = 0; i < N; i++)
{
B;
}

Once the decision fails, which we assume once it fails it always 
fails in this particular case.


Therefor, once the decision fails it kicks in the "faster" code. 
Suppose decision is very slow.


The cycle cost is then n times rather than N times.

In general, such techniques can be used to speed up code, 
sometimes significantly, but are difficult to implement using a 
standard pattern and for more complex decision functions.



Does anyone see any way to use some some of D's power to help 
simplify such things but not using something like a strategy 
pattern or complexifying the overall design(using advanced 
techniques such as templates, mixins is not making the code more 
complex).

















Re: Efficient idiom for fastest code

2018-05-22 Thread Nicholas Wilson via Digitalmars-d-learn
On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:


Many times in expensive loops one must make decisions. 
Decisions must be determined and the determination costs.


for(int i = 0; i < N; i++)
{
if(decision(i)) A; else B;
}

the if statement costs N times the cycle cost.

In some cases the decision holds for continuous ranges. For 
some 0 <= n <= N the decision is constant, but n is 
arbitrary(determined by unknown factors at compile time).


One can speed up the routine by using something akin to a 
simplified strategy pattern where one uses 
functions/delegates/lambdas to code a faster version without 
the test:



for(int i = 0; i < N; i++)
{
d = (){ if(decision(i)) A; else d = () { B; };
d();
}



assuming you meant

auto d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
for(int i = 0; i < N; i++)
{
d(i);
}




this code basically reduces to

for(int i = 0; i < N; i++)
{
B;
}

Once the decision fails, which we assume once it fails it 
always fails in this particular case.


Therefor, once the decision fails it kicks in the "faster" 
code. Suppose decision is very slow.


The cycle cost is then n times rather than N times.

In general, such techniques can be used to speed up code, 
sometimes significantly, but are difficult to implement using a 
standard pattern and for more complex decision functions.



Does anyone see any way to use some some of D's power to help 
simplify such things but not using something like a strategy 
pattern or complexifying the overall design(using advanced 
techniques such as templates, mixins is not making the code 
more complex).


If decision is pure, consider using static foreach 
(iota(N).map!decision) { ... }to unroll it at compile time. even 
if it isn't the compiler may still be able to factor out parts of 
repeated calls to decision, look at the assembly/IR to confirm 
this.


Otherwise PROFILE! (and use profile guided optimisation! this 
implies using a compiler other than DMD which if you care about 
performance you should be doing anyway)


Blindly trying to optimise is just as likely to hurt performance.
in particular don't underestimate the branch predictor. Even if 
decision is complex, if there is a pattern in evaluating 
iota(n).map!decision the branch predictor will pick up on it.


In terms of exploiting knowledge of decision a priori that the 
compiler somehow lacks you really don't have much option but to 
do it yourself.





Re: Efficient idiom for fastest code

2018-05-22 Thread IntegratedDimensions via Digitalmars-d-learn

On Wednesday, 23 May 2018 at 03:00:17 UTC, Nicholas Wilson wrote:
On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:


Many times in expensive loops one must make decisions. 
Decisions must be determined and the determination costs.


for(int i = 0; i < N; i++)
{
if(decision(i)) A; else B;
}

the if statement costs N times the cycle cost.

In some cases the decision holds for continuous ranges. For 
some 0 <= n <= N the decision is constant, but n is 
arbitrary(determined by unknown factors at compile time).


One can speed up the routine by using something akin to a 
simplified strategy pattern where one uses 
functions/delegates/lambdas to code a faster version without 
the test:



for(int i = 0; i < N; i++)
{
d = (){ if(decision(i)) A; else d = () { B; };
d();
}



assuming you meant

auto d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
for(int i = 0; i < N; i++)
{
d(i);
}




this code basically reduces to

for(int i = 0; i < N; i++)
{
B;
}

Once the decision fails, which we assume once it fails it 
always fails in this particular case.


Therefor, once the decision fails it kicks in the "faster" 
code. Suppose decision is very slow.


The cycle cost is then n times rather than N times.

In general, such techniques can be used to speed up code, 
sometimes significantly, but are difficult to implement using 
a standard pattern and for more complex decision functions.



Does anyone see any way to use some some of D's power to help 
simplify such things but not using something like a strategy 
pattern or complexifying the overall design(using advanced 
techniques such as templates, mixins is not making the code 
more complex).


If decision is pure, consider using static foreach 
(iota(N).map!decision) { ... }to unroll it at compile time. 
even if it isn't the compiler may still be able to factor out 
parts of repeated calls to decision, look at the assembly/IR to 
confirm this.


Otherwise PROFILE! (and use profile guided optimisation! 
this implies using a compiler other than DMD which if you care 
about performance you should be doing anyway)


Blindly trying to optimise is just as likely to hurt 
performance.
in particular don't underestimate the branch predictor. Even if 
decision is complex, if there is a pattern in evaluating 
iota(n).map!decision the branch predictor will pick up on it.


In terms of exploiting knowledge of decision a priori that the 
compiler somehow lacks you really don't have much option but to 
do it yourself.


I knew someone was going to say that and I forgot to say DON'T!

Saying to profile when I clearly said these ARE cases where they 
are slow is just moronic. Please don't use default answers to 
arguments.


This was a general question about cases on how to attack a 
problem WHEN profiling says I need to optimize. Your SO 101 
answer sucks! Sorry!


To prove to you that your answer is invalid:

I profile my code, it says that it is very slow and shows that it 
is do to the decision checking... I then I have to come here and 
write up a post trying to explain how to solve the problem. I 
then get a post telling me I should profile. I then respond I did 
profile and that this is my problem. A lot of wasted energy when 
it is better to know a general attack strategy. Yes, some of us 
can judge if code is needed to be optimized before profiling. It 
is not difficult. Giving a generic answer that always does not 
apply and is obvious to anyone trying to do optimization is not 
helpful. Everyone today pretty must does not even optimize code 
anymore... this isn't 1979. It's not ok to keep repeating the 
same mantra. I guess we should turn this in to a meme?


The reason I'm getting on to you is that the "profile before 
optimization" sounds a bit grade school, specially since I wasn't 
talking anything about profiling but a general programming 
pattern speed up code, which is always valid but not always 
useful(and hence that is when profiling comes in).




Re: Efficient idiom for fastest code

2018-05-23 Thread Malte via Digitalmars-d-learn
On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:
In some cases the decision holds for continuous ranges. For 
some 0 <= n <= N the decision is constant, but n is 
arbitrary(determined by unknown factors at compile time).


One can speed up the routine by using something akin to a 
simplified strategy pattern where one uses 
functions/delegates/lambdas to code a faster version without 
the test:



for(int i = 0; i < N; i++)
{
d = (){ if(decision(i)) A; else d = () { B; };
d();
}

this code basically reduces to

for(int i = 0; i < N; i++)
{
B;
}

Once the decision fails, which we assume once it fails it 
always fails in this particular case.


Therefor, once the decision fails it kicks in the "faster" 
code. Suppose decision is very slow.


I would just do

int i=0;
for(;decision(i) && i < N; i++)
{
   A;
}
for(;i < N; i++)
{
   B;
}


This could be turned to a mixin template with something like this:

mixin template forSplit(alias condition, alias A, alias B)
{
   void execute()
   {
   int i = 0;
   for (; condition(i) && i < N; i++)
   {
   A();
   }
   for (; i < N; i++)
   {
   B();
   }
   }
}


and to use it in code (assuming N is defined in the scope)

mixin forSplit!((int i)=>(decision(i)), {A;}, {B;}) loop;
loop.execute();


I have't measured anything, but I would assume that delegates 
come with an overhead that you just don't need here. In fact when 
trying to use

auto d = (int i) {};
d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
for(int i = 0; i < N; i++)
{
d(i);
}
All I got was "cannot access frame of function D main" which sums 
up my experiences with lambdas in D so far.


While PGO and branch predictor are good, they don't help much 
here. Not executing an expression is out of scope for them. All 
they do is prevent pipeline flushes.
Also I think you overestimate what the compiler can do. My 
decision function to do some testing was this:

bool decision(int a) pure
out (result) {
assert(result == (a < 10));
} do {
   import std.algorithm, std.range;

   // stolen from https://dlang.org/library/std/parallelism.html
   enum n = 1_000_000;
   enum delta = 1.0 / n;
   alias getTerm = (int i)
   {
   immutable x = ( i - 0.5 ) * delta;
   return delta / ( 1.0 + x * x ) ;
   };
   immutable pi = 4.0 * reduce!"a + b"(n.iota.map!getTerm);

   return a < 3*pi;
}
With N=100 I got a speedup of ~10 (ldc -O3 -release). Even though 
this function is pure and could be optimized a lot. It calculated 
pi for every single call. And optimizing the decision function 
isn't even the point of that question.


Re: Efficient idiom for fastest code

2018-05-23 Thread IntegratedDimensions via Digitalmars-d-learn

On Wednesday, 23 May 2018 at 10:55:02 UTC, Malte wrote:
On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:

[...]


I would just do

[...]


[...]



Thanks, I didn't think about using a for loop like that. While it 
is not the most general it does solve the specific case for a 
simple step/toggled decision.





Re: Efficient idiom for fastest code

2018-05-23 Thread Nicholas Wilson via Digitalmars-d-learn
On Wednesday, 23 May 2018 at 03:12:52 UTC, IntegratedDimensions 
wrote:

I knew someone was going to say that and I forgot to say DON'T!

Saying to profile when I clearly said these ARE cases where 
they are slow is just moronic. Please don't use default answers 
to arguments.


This was a general question about cases on how to attack a 
problem WHEN profiling says I need to optimize. Your SO 101 
answer sucks! Sorry!


To prove to you that your answer is invalid:

I profile my code, it says that it is very slow and shows that 
it is do to the decision checking... I then I have to come here 
and write up a post trying to explain how to solve the problem. 
I then get a post telling me I should profile. I then respond I 
did profile and that this is my problem. A lot of wasted energy 
when it is better to know a general attack strategy. Yes, some 
of us can judge if code is needed to be optimized before 
profiling. It is not difficult. Giving a generic answer that 
always does not apply and is obvious to anyone trying to do 
optimization is not helpful. Everyone today pretty must does 
not even optimize code anymore... this isn't 1979. It's not ok 
to keep repeating the same mantra. I guess we should turn this 
in to a meme?


The reason I'm getting on to you is that the "profile before 
optimization" sounds a bit grade school, specially since I 
wasn't talking anything about profiling but a general 
programming pattern speed up code, which is always valid but 
not always useful(and hence that is when profiling comes in).


I'm going to ignore the tone of your response, but I am going to 
say that responding like that isn't going to get you very far. 
Don't expect others to do likewise.


Assuming that your decision function is indeed the bottleneck, 
you'll see I did actually provide some hints as to how to 
optimise the case where decision is pure. even if you can't 
convince the compiler to inline and expression combine, as in the 
case for the other answer, you can memoize it (look in 
std.functional).


One of the great things about D is that you can force lots of 
computation to happen at compile time, so in the case where 
decision is impure, factoring it into pure and impure parts and 
`enum x = pureFunc(args);`ing the part that can be can make a 
large difference if you can't convince the optimiser to do it for 
you.


If you still can't do that consider Jitting it  
(https://github.com/ldc-developers/druntime/blob/e3bfc5fb780967f1b6807039ff00b2ccaf4b03d9/src/ldc/attributes.d#L78 ) with `-enable-dynamic-compile` or running the loop in parallel with std.parallelism.


Re: Efficient idiom for fastest code

2018-05-24 Thread biocyberman via Digitalmars-d-learn
On Wednesday, 23 May 2018 at 03:12:52 UTC, IntegratedDimensions 
wrote:
On Wednesday, 23 May 2018 at 03:00:17 UTC, Nicholas Wilson 
wrote:

[...]


I knew someone was going to say that and I forgot to say DON'T!

Saying to profile when I clearly said these ARE cases where 
they are slow is just moronic. Please don't use default answers 
to arguments.


This was a general question about cases on how to attack a 
problem WHEN profiling says I need to optimize. Your SO 101 
answer sucks! Sorry!


To prove to you that your answer is invalid:

I profile my code, it says that it is very slow and shows that 
it is do to the decision checking... I then I have to come here 
and write up a post trying to explain how to solve the problem. 
I then get a post telling me I should profile. I then respond I 
did profile and that this is my problem. A lot of wasted energy 
when it is better to know a general attack strategy. Yes, some 
of us can judge if code is needed to be optimized before 
profiling. It is not difficult. Giving a generic answer that 
always does not apply and is obvious to anyone trying to do 
optimization is not helpful. Everyone today pretty must does 
not even optimize code anymore... this isn't 1979. It's not ok 
to keep repeating the same mantra. I guess we should turn this 
in to a meme?


The reason I'm getting on to you is that the "profile before 
optimization" sounds a bit grade school, specially since I 
wasn't talking anything about profiling but a general 
programming pattern speed up code, which is always valid but 
not always useful(and hence that is when profiling comes in).


Very challenging. Wish I could help you out with the tough work. 
People don't share the same context, especially via online, so it 
is necessary to clarify the  problem so other can understand and 
help. I've been beaten on stackoverflow many times for not 
providing sufficient information for my questions. It seems like 
one can do the reverse here at forum.dlang.org.


With that said, I think you know what you are doing, and you can 
do it. Just relax and give it more time and experimentation.