Re: Optimize away immediately-called delegate literals?

2012-03-14 Thread Steven Schveighoffer
On Mon, 12 Mar 2012 20:28:15 -0400, H. S. Teoh   
wrote:




Hmph.

I tried this code:

import std.stdio;
struct A {
int[] data;
int opApply(int delegate(ref int) dg) {
foreach (d; data) {
if (dg(d)) return 1;
}
return 0;
}
}
void main() {
A a;
int n = 0;

a.data = [1,2,3,4,5];
foreach (d; a) {
n++;
}
}

With both dmd and gdc, the delegate is never inlined. :-(  Compiling
with gdc -O3 causes opApply to get inlined and loop-unrolled, but the
call to the delegate is still there. With dmd -O, even opApply is not
inlined, and the code is generally much longer per loop iteration than
gdc -O3.


IIRC, ldc does inline opApply.  But this is somewhat hearsay since I don't  
use ldc.  I'm just remembering what others have posted here.


-Steve


Re: Optimize away immediately-called delegate literals?

2012-03-12 Thread Brad Roberts
On 3/12/2012 8:10 PM, Nick Sabalausky wrote:
> "Brad Roberts"  wrote:
>>
>> See also: bug 4440
>>
>> The patch in there, if it hasn't bit rotten to badly (I suspect it has) 
>> will handle _this_ case.  But almost no other
>> case of inlining delegates.
>>
>> It'd be a good area for someone who wants an interesting and non-trivial 
>> problem to dig into.  It wouldn't touch all
>> that much of the codebase as the inliner is fairly self-contained.  At 
>> least, that's what I recall from when I looked at
>> this stuff a couple years ago.
>>
> 
> Do you think that patch would be a good starting place for further work, or 
> would a proper solution likely necessitate an entirely different approach?
> 

I suspect that, to fix it for a broader set of cases, it'll need much larger 
changes.  During those changes the simple
version in that patch would likely disappear.  But take that all with a huge 
grain of salt.


Re: Optimize away immediately-called delegate literals?

2012-03-12 Thread Nick Sabalausky
"Brad Roberts"  wrote in message 
news:mailman.582.1331607753.4860.digitalmar...@puremagic.com...
> On 3/12/2012 4:15 PM, Peter Alexander wrote:
>> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
>>> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?
>>> [...]
>>>
>>> I've always wondered about whether delegates passed to opApply ever get
>>> inlined.
>>
>> Don't wonder. Find out!
>>
>> import std.stdio;
>> void doStuff() { writeln("Howdy!"); }
>> void main() {
>> int x = 1, y = 2, z = 3;
>> auto a = x + (){ doStuff(); return y; }() + z;
>> writeln(a);
>> }
>
> See also: bug 4440
>
> The patch in there, if it hasn't bit rotten to badly (I suspect it has) 
> will handle _this_ case.  But almost no other
> case of inlining delegates.
>
> It'd be a good area for someone who wants an interesting and non-trivial 
> problem to dig into.  It wouldn't touch all
> that much of the codebase as the inliner is fairly self-contained.  At 
> least, that's what I recall from when I looked at
> this stuff a couple years ago.
>

Do you think that patch would be a good starting place for further work, or 
would a proper solution likely necessitate an entirely different approach?




Re: Optimize away immediately-called delegate literals?

2012-03-12 Thread Nick Sabalausky
"Peter Alexander"  wrote in message 
news:thetmhnnbeepmxgus...@forum.dlang.org...
> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
>> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
>>> Suppose you have a delegate literal and immediately call it:
>>>
>>> auto a = x + (){ doStuff(); return y; }() + z;
>>>
>>> Does DMD ever (or always?) optimize away a delegate if it's executed
>>> immediately and never stored into a variable? If not, can it, and
>>> would it be a simple change? Is something like this already on the
>>> table?
>> [...]
>>
>> I've always wondered about whether delegates passed to opApply ever get
>> inlined.
>
> Don't wonder. Find out!
>
> import std.stdio;
> void doStuff() { writeln("Howdy!"); }
> void main() {
> int x = 1, y = 2, z = 3;
> auto a = x + (){ doStuff(); return y; }() + z;
> writeln(a);
> }
>
> $ dmd test.d -O -release -inline
>
> __Dmain:
> 0001106c pushq %rbp
[...]

I keep forgetting I can do that. ;)

>
> In short. No! It doesn't currently inline in this case.
>
> Even if the lambda just returns a constant, it doesn't get inlined.

Darn.

The reason this came up is I've gotten back into some more work on HaxeD 
(Haxe -> D converter). Haxe allows statements to be used as expressions. For 
example:

x = 5 + if(cond) { foo(); 1; } else 2;

At the moment, I'm converting those by just tossing them into an 
immediately-called anonymous delegate:

x = 5 + (){ ...blah... }();

But as I was afraid of, there's a performance hit with that. So eventually, 
I'll have to either do some fancier reworking:

// Something like
typeof(1) __tmp1;
if(cond) { foo(); __tmp1 = 1; } else __tmp1 = 2;
x = 5 + __tmp1;

...which might even run into some problems with order-of-execution. Or 
delegate inlining would have to get added to DMD.

Now that I actually look (heh :) ), it looks like there's an old Buzilla 
issue for it with an apperently overly-limited patch:

http://d.puremagic.com/issues/show_bug.cgi?id=4440

Unfortunately, according to Brad Roberts in the last comment: "Getting 
delegate inlining in dmd is going to take serious work." (Ugh, I wish 
Windows had an easy way to copy/paste text *without* the styling.)




Re: Optimize away immediately-called delegate literals?

2012-03-12 Thread Brad Roberts
On 3/12/2012 4:15 PM, Peter Alexander wrote:
> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
>> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
>>> Suppose you have a delegate literal and immediately call it:
>>>
>>> auto a = x + (){ doStuff(); return y; }() + z;
>>>
>>> Does DMD ever (or always?) optimize away a delegate if it's executed
>>> immediately and never stored into a variable? If not, can it, and
>>> would it be a simple change? Is something like this already on the
>>> table?
>> [...]
>>
>> I've always wondered about whether delegates passed to opApply ever get
>> inlined.
> 
> Don't wonder. Find out!
> 
> import std.stdio;
> void doStuff() { writeln("Howdy!"); }
> void main() {
> int x = 1, y = 2, z = 3;
> auto a = x + (){ doStuff(); return y; }() + z;
> writeln(a);
> }

See also: bug 4440

The patch in there, if it hasn't bit rotten to badly (I suspect it has) will 
handle _this_ case.  But almost no other
case of inlining delegates.

It'd be a good area for someone who wants an interesting and non-trivial 
problem to dig into.  It wouldn't touch all
that much of the codebase as the inliner is fairly self-contained.  At least, 
that's what I recall from when I looked at
this stuff a couple years ago.

Later,
Brad


Re: Optimize away immediately-called delegate literals?

2012-03-12 Thread H. S. Teoh
On Tue, Mar 13, 2012 at 12:15:02AM +0100, Peter Alexander wrote:
> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
> >On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
> >>Suppose you have a delegate literal and immediately call it:
> >>
> >>auto a = x + (){ doStuff(); return y; }() + z;
> >>
> >>Does DMD ever (or always?) optimize away a delegate if it's executed
> >>immediately and never stored into a variable? If not, can it, and
> >>would it be a simple change? Is something like this already on the
> >>table?
> >[...]
> >
> >I've always wondered about whether delegates passed to opApply
> >ever get
> >inlined.
> 
> Don't wonder. Find out!
> 
> import std.stdio;
> void doStuff() { writeln("Howdy!"); }
> void main() {
> int x = 1, y = 2, z = 3;
> auto a = x + (){ doStuff(); return y; }() + z;
> writeln(a);
> }
> 
> $ dmd test.d -O -release -inline
> 
> __Dmain:
> 0001106c  pushq   %rbp
> 0001106d  movq%rsp,%rbp
> 00011070  pushq   %rax
> 00011071  pushq   %rbx
> 00011072  movq$0x000c,%rdi
> 0001107c  callq   0x1000237f0 ; symbol stub for:
> __d_allocmemory
> 00011081  movq%rax,%rbx
> 00011084  movq$0x,(%rbx)
> 0001108b  movl$0x0002,0x08(%rbx)
> 00011092  movq%rbx,%rdi
> 00011095  call*0x0002318c(%rip)
> 0001109c  leal0x04(%rax),%edx
> 0001109f  movl$0x000a,%esi
> 000110a4  leaq0x00033eed(%rip),%rdi
> 000110ab  callq   0x10002319c ; symbol stub for:
> _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv
> 000110b0  xorl%eax,%eax
> 000110b2  popq%rbx
> 000110b3  movq%rbp,%rsp
> 000110b6  popq%rbp
> 000110b7  ret
> 
> In short. No! It doesn't currently inline in this case.
> 
> Even if the lambda just returns a constant, it doesn't get inlined.

Hmph.

I tried this code:

import std.stdio;
struct A {
int[] data;
int opApply(int delegate(ref int) dg) {
foreach (d; data) {
if (dg(d)) return 1;
}
return 0;
}
}
void main() {
A a;
int n = 0;

a.data = [1,2,3,4,5];
foreach (d; a) {
n++;
}
}

With both dmd and gdc, the delegate is never inlined. :-(  Compiling
with gdc -O3 causes opApply to get inlined and loop-unrolled, but the
call to the delegate is still there. With dmd -O, even opApply is not
inlined, and the code is generally much longer per loop iteration than
gdc -O3.

Here's the code generated by gdc for the foreach() loop in main():

  404839:   48 89 c3mov%rax,%rbx
  40483c:   48 8b 04 24 mov(%rsp),%rax
  404840:   48 8d 74 24 3c  lea0x3c(%rsp),%rsi
  404845:   48 8d 7c 24 20  lea0x20(%rsp),%rdi
  40484a:   48 89 03mov%rax,(%rbx)
  40484d:   48 8b 44 24 08  mov0x8(%rsp),%rax
  404852:   48 89 43 08 mov%rax,0x8(%rbx)
  404856:   8b 44 24 10 mov0x10(%rsp),%eax
  40485a:   89 43 10mov%eax,0x10(%rbx)
  40485d:   8b 03   mov(%rbx),%eax
  40485f:   89 44 24 3c mov%eax,0x3c(%rsp)
  404863:   ff d5   callq  *%rbp
  404865:   85 c0   test   %eax,%eax
  404867:   75 58   jne4048c1 <_Dmain+0xd1>
  404869:   8b 43 04mov0x4(%rbx),%eax
  40486c:   48 8d 74 24 3c  lea0x3c(%rsp),%rsi
  404871:   48 8d 7c 24 20  lea0x20(%rsp),%rdi
  404876:   89 44 24 3c mov%eax,0x3c(%rsp)
  40487a:   ff d5   callq  *%rbp
  40487c:   85 c0   test   %eax,%eax
  40487e:   75 41   jne4048c1 <_Dmain+0xd1>
  404880:   8b 43 08mov0x8(%rbx),%eax
  404883:   48 8d 74 24 3c  lea0x3c(%rsp),%rsi
  404888:   48 8d 7c 24 20  lea0x20(%rsp),%rdi
  40488d:   89 44 24 3c mov%eax,0x3c(%rsp)
  404891:   ff d5   callq  *%rbp
  404893:   85 c0   test   %eax,%eax
  404895:   75 2a   jne4048c1 <_Dmain+0xd1>
  404897:   8b 43 0cmov0xc(%rbx),%eax
  40489a:   48 8d 74 24 3c  lea0x3c(%rsp),%rsi
  40489f:   48 8d 7c 24 20  lea0x20(%rsp),%rdi
  4048a4:   89 44 24 3c mov%eax,0x3c(%rsp)
  4048a8:   ff d5   callq  *%rbp
  4048aa:   85 c0   test   %ea

Re: Optimize away immediately-called delegate literals?

2012-03-12 Thread Peter Alexander

On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:

On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:

Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's 
executed
immediately and never stored into a variable? If not, can it, 
and
would it be a simple change? Is something like this already on 
the

table?

[...]

I've always wondered about whether delegates passed to opApply 
ever get

inlined.


Don't wonder. Find out!

import std.stdio;
void doStuff() { writeln("Howdy!"); }
void main() {
int x = 1, y = 2, z = 3;
auto a = x + (){ doStuff(); return y; }() + z;
writeln(a);
}

$ dmd test.d -O -release -inline

__Dmain:
0001106cpushq   %rbp
0001106dmovq%rsp,%rbp
00011070pushq   %rax
00011071pushq   %rbx
00011072movq$0x000c,%rdi
0001107c	callq	0x1000237f0	; symbol stub for: 
__d_allocmemory

00011081movq%rax,%rbx
00011084movq$0x,(%rbx)
0001108bmovl$0x0002,0x08(%rbx)
00011092movq%rbx,%rdi
00011095call*0x0002318c(%rip)
0001109cleal0x04(%rax),%edx
0001109fmovl$0x000a,%esi
000110a4leaq0x00033eed(%rip),%rdi
000110ab	callq	0x10002319c	; symbol stub for: 
_D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv

000110b0xorl%eax,%eax
000110b2popq%rbx
000110b3movq%rbp,%rsp
000110b6popq%rbp
000110b7ret

In short. No! It doesn't currently inline in this case.

Even if the lambda just returns a constant, it doesn't get 
inlined.


Re: [OT] Hanlon's Razor (Was: Optimize away immediately-called delegate literals?)

2012-03-12 Thread H. S. Teoh
On Sun, Mar 11, 2012 at 03:03:39AM -0400, Nick Sabalausky wrote:
> "H. S. Teoh"  wrote in message 
> news:mailman.455.1331448575.4860.digitalmar...@puremagic.com...
> >
> > Never ascribe to malice that which is adequately explained by
> > incompetence. -- Napoleon Bonaparte
> 
> Pardon me veering offtopic at one of your taglines yet again, but I
> have to say, this is one that I've believed very strongly in for
> years. It's practically a mantra of mine, a big part of my own
> personal philosophy. I've never heard it attributed to Napoleon,
> though. I always knew it as "Hanlon's Razor", somewhat of a corollary
> to Occam's Razor.
[...]

I guess my sources aren't that reliable. I just copy-n-pasted that quote
from somebody else's sig. :-)


T

-- 
Some ideas are so stupid that only intellectuals could believe them. -- George 
Orwell


[OT] Hanlon's Razor (Was: Optimize away immediately-called delegate literals?)

2012-03-10 Thread Nick Sabalausky
"H. S. Teoh"  wrote in message 
news:mailman.455.1331448575.4860.digitalmar...@puremagic.com...
>
> Never ascribe to malice that which is adequately explained by
> incompetence. -- Napoleon Bonaparte

Pardon me veering offtopic at one of your taglines yet again, but I have to 
say, this is one that I've believed very strongly in for years. It's 
practically a mantra of mine, a big part of my own personal philosophy. I've 
never heard it attributed to Napoleon, though. I always knew it as "Hanlon's 
Razor", somewhat of a corollary to Occam's Razor.




Re: Optimize away immediately-called delegate literals?

2012-03-10 Thread H. S. Teoh
On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
> Suppose you have a delegate literal and immediately call it:
> 
> auto a = x + (){ doStuff(); return y; }() + z;
> 
> Does DMD ever (or always?) optimize away a delegate if it's executed
> immediately and never stored into a variable? If not, can it, and
> would it be a simple change? Is something like this already on the
> table?
[...]

I've always wondered about whether delegates passed to opApply ever get
inlined. Seems to be pretty silly to do the entire function pointer +
context pointer and full function call thing, if both opApply and the
delegate are very simple. Especially if opApply is not much more than a
for loop of some sort, perhaps with a line or two of private member
access code inserted.


T

-- 
Never ascribe to malice that which is adequately explained by
incompetence. -- Napoleon Bonaparte


Optimize away immediately-called delegate literals?

2012-03-10 Thread Nick Sabalausky
Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's executed 
immediately and never stored into a variable? If not, can it, and would it 
be a simple change? Is something like this already on the table?