Re: pi benchmark on ldc and dmd

2011-08-02 Thread Jason House
The post says they did dmd -O. They did not mention -inline -noboundscheck 
-release. There may be extra flags that are required.

Walter Bright Wrote:

 http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98
 
 Anyone care to examine the assembler output and figure out why?



Re: pi benchmark on ldc and dmd

2011-08-02 Thread Trass3r
Am 02.08.2011, 05:38 Uhr, schrieb Adam D. Ruppe  
destructiona...@gmail.com:

I was waiting over an hour just for gcc+gdc to compile! In the
time it takes for gcc's configure script to run, you can make
clean, build dmd, druntime and phobos.


Make sure you disable bootstrapping. Compiling gdc works pleasantly fast  
for me. Try compiling it on Windows, that's what I call slow.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Adam D. Ruppe
 LDC builds in under a half hour, even on my underpowered ARM SoC,
 so I don't see how you could be having trouble there.

Building dmd from the zip took 37 *seconds* for me just now, after
running a make clean (this is on Linux).

gdc and ldc have their advantages, but they have disadvantages too.
I think the people saying abandon dmd don't know the other side
of the story.


Basically, I think the more compilers we have for D the better.
gdc is good. ldc is good. And so is dmd. We shouldn't abandon
any of them.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Robert Clipsham

On 02/08/2011 00:40, Walter Bright wrote:

http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98


Anyone care to examine the assembler output and figure out why?


I was talking to David Nadlinger the other day, and there was some sort 
of codegen bug causing things to massively outperform dmd and clang with 
equivalent code - it's possible this is the cause, I don't know without 
looking though. He may be able to shed some light on it.


--
Robert
http://octarineparrot.com/


Re: pi benchmark on ldc and dmd

2011-08-02 Thread David Nadlinger

On 8/2/11 7:34 PM, Robert Clipsham wrote:

On 02/08/2011 00:40, Walter Bright wrote:

Anyone care to examine the assembler output and figure out why?


I was talking to David Nadlinger the other day, and there was some sort
of codegen bug causing things to massively outperform dmd and clang with
equivalent code - it's possible this is the cause, I don't know without
looking though. He may be able to shed some light on it.


Nope, this turned out to be a bug in my program, where some memory chunk 
used as test input data was prematurely garbage collected (that only 
surfaced with aggressive compiler optimizations, which is why I 
suspected a compiler bug).


David


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Walter Bright

On 8/2/2011 5:00 AM, Jason House wrote:

The post says they did dmd -O. They did not mention -inline -noboundscheck
-release. There may be extra flags that are required.


Often when I see benchmark results like this, I wait to see what the actual 
problem is before jumping to conclusions. I have a lot of experience with this :-)


The results could be any of:

1. wrong flags used (especially by inexperienced users)

2. the benchmark isn't measuring what it purports to be (an example might be it 
is actually measuring printf or malloc speed, not the generated code)


3. the benchmark is optimized for one particular compiler/language by someone 
very familiar with that compiler/language and it exploits a particular quirk of it


4. the compiler is hand optimized for a specific benchmark, and the great 
results disappear if anything in the source code changes (yes, this is dirty, 
and I've seen it done by big name compilers)


5. the different benchmarks are run on different computers

6. the memory layout could wind up arbitrarily different for the different 
compilers/languages, resulting in different performance due to memory caching


etc.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Adam D. Ruppe
I think I have it: 64 bit registers. I got ldc to work
in 32 bit (didn't have that yesterday, so I was doing 64 bit only)
and compiled.

No difference in timing between ldc 32 bit and dmd 32 bit.
The disassembly isn't identical but the time is. (The disassembly
seems to mainly order things differently, but ldc has fewer jump
instructions too.)

Anyway.

In 64 bit, ldc gets a speedup over dmd. Looking at the asm
output, it looks like dmd doesn't use any of the new registers,
whereas ldc does. (dmd's 64 bit looks mostly like 32 bit code with
r instead of e.)


Here's the program. It's based on one of the Python ones.


import std.bigint;
import std.stdio;

alias BigInt number;

void main() {
auto N = 1;

number i, k, ns;
number k1 = 1;
number n,a,d,t,u;
n = 1;
d = 1;
while(1) {
k += 1;
t = n1;
n *= k;
a += t;
k1 += 2;
a *= k1;
d *= k1;
if(a = n) {
t = (n*3 +a)/d;
u = (n*3 +a)%d;
u += n;
if(d  u) {
ns = ns*10 + t;
i += 1;
if(i % 10 == 0) {
debug writefln (%010d\t:%d, ns, i);
ns = 0;
}
if(i = N) {
break;
}
a -= d*t;
a *= 10;
n *= 10;
}
}
}
}
=

BigInt's calls aren't inlined, but that's a frontend issue. Let's
eliminate that by switching to long in that alias.

The result will be wrong, but that's beside the point for now. I
just want to see integer math. (this is why the writefln is debug
too)

With optimizations turned on, ldc again wins by the same ratio -
it runs in about 2/3 the time - and the code is much easier to look
at.


Let's see what's going on.


The relevant loop from DMD (64 bit):

===
L47:inc qword ptr -040h[RBP]
mov RAX,-028h[RBP]
add RAX,RAX
mov -010h[RBP],RAX
mov RAX,-040h[RBP]
imulRAX,-028h[RBP]
mov -028h[RBP],RAX
mov RAX,-010h[RBP]
add -020h[RBP],RAX
add qword ptr -030h[RBP],2
mov RAX,-030h[RBP]
imulRAX,-020h[RBP]
mov -020h[RBP],RAX
mov RAX,-030h[RBP]
imulRAX,-018h[RBP]
mov -018h[RBP],RAX
mov RAX,-020h[RBP]
cmp RAX,-028h[RBP]
jl  L47
mov RAX,-028h[RBP]
lea RAX,[RAX*2][RAX]
add RAX,-020h[RBP]
mov -058h[RBP],RAX
cqo
idivqword ptr -018h[RBP]
mov -010h[RBP],RAX
mov RAX,-058h[RBP]
cqo
idivqword ptr -018h[RBP]
mov -8[RBP],RDX
mov RAX,-028h[RBP]
add -8[RBP],RAX
mov RAX,-018h[RBP]
cmp RAX,-8[RBP]
jle L47
mov RAX,-038h[RBP]
lea RAX,[RAX*4][RAX]
add RAX,RAX
add RAX,-010h[RBP]
mov -038h[RBP],RAX
inc qword ptr -048h[RBP]
mov RAX,-048h[RBP]
mov RCX,0Ah
cqo
idivRCX
testRDX,RDX
jne L109
mov qword ptr -038h[RBP],0
L109:   cmp qword ptr -048h[RBP],02710h
jge L137
mov RAX,-018h[RBP]
imulRAX,-010h[RBP]
sub -020h[RBP],RAX
imulEAX,-020h[RBP],0Ah
mov -020h[RBP],RAX
imulEAX,-028h[RBP],0Ah
mov -028h[RBP],RAX
jmp   L47
===


and from ldc 64 bit:


L20:add RDI,2
inc RCX
lea R9,[R10*2][R9]
imulR9,RDI
imulR8,RDI
imulR10,RCX
cmp R9,R10
jl  L20
lea RAX,[R10*2][R10]
add RAX,R9
cqo
idivR8
add RDX,R10
cmp R8,RDX
jle L20
cmp RSI,0270Fh
jg  L73
  

Re: pi benchmark on ldc and dmd

2011-08-02 Thread Andrew Wiley
On Tue, Aug 2, 2011 at 7:08 AM, Adam D. Ruppe destructiona...@gmail.comwrote:

  LDC builds in under a half hour, even on my underpowered ARM SoC,
  so I don't see how you could be having trouble there.

 Building dmd from the zip took 37 *seconds* for me just now, after
 running a make clean (this is on Linux).

 gdc and ldc have their advantages, but they have disadvantages too.
 I think the people saying abandon dmd don't know the other side
 of the story.


 Basically, I think the more compilers we have for D the better.
 gdc is good. ldc is good. And so is dmd. We shouldn't abandon
 any of them.


For the record, I'm fine with the current arrangement and just playing
devil's advocate here:

So far, you've outlined that GDC takes a while to build and the build
processes for GDC and LDC are inconvenient as the only disadvantages they
have.
LDC took about 3 minutes on a Linux VM on my laptop, and since it has proper
incremental build support through CMake, I don't really see that qualifying
as a disadvantage. The only people that really need to regularly build
compilers are the folks that work on them, and that's why we have
incremental builds.
Now, DMD does have speed on its side. It doesn't have debugging support (you
have to jump through hoops on Windows and Linux is just a joke), binary and
object file compatibility (even GDC has more going for it on Windows than
DMD does), platform compatibility (outside x86 and x86_64), name recognition
(I'm a college student, and people look at me funny when I mention Digital
Mars), shared library support, and acceptance in the Linux world.
The reason I use GDC for pretty much all my development is that it has all
those things, and the reason I think it's worth playing devil's advocate and
really considering the current situation is that GDC and LDC get all this
for free by wiring up the DMD frontend to a different backend. The current
state of affairs is certainly maintainable, but I think it's worth some
thought as to whether it would be better in the long run if we started
officially supporting a more accepted backend.
My example would be Go, which got all sorts of notice when gccgo became
important enough to get into the GCC codebase.

I'm not saying DMD is terrible because it isn't. I'm just saying that there
are a lot of benefits to be had by developing a more mature compiler on top
of GCC or LLVM, and that we should consider whether that's a goal we should
be working more towards.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread KennyTM~

On Aug 2, 11 20:00, Jason House wrote:

The post says they did dmd -O. They did not mention -inline -noboundscheck 
-release. There may be extra flags that are required.

Walter Bright Wrote:


http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98

Anyone care to examine the assembler output and figure out why?




Let dmd have an '-O' flag to as a synonym of '-O -inline 
-noboundscheck -release' so people won't miss the extra flags in 
benchmarks. [/joke]


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Adam D. Ruppe
On the flags: I did use them, but didn't write it all out and
tried to make them irrelevant (by avoiding functions and arrays).

But, if the same ones are passed to each compiler, it shouldn't
matter anyway... the idea is to get an apples to apples comparison
between the two D implementations, not to chase after a number itself.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Walter Bright

On 8/2/2011 12:49 PM, Adam D. Ruppe wrote:

So I'm pretty sure the difference is caused by dmd not using the
new registers in x64. The other differences look trivial to my
eyes.


dmd does use all the registers on the x64, but it seems to not be enregistering 
here. I'll have a look see.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Iain Buclaw
== Quote from KennyTM~ (kenn...@gmail.com)'s article
 On Aug 2, 11 20:00, Jason House wrote:
  The post says they did dmd -O. They did not mention -inline 
  -noboundscheck
-release. There may be extra flags that are required.
 
  Walter Bright Wrote:
 
  http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98
 
  Anyone care to examine the assembler output and figure out why?
 
 Let dmd have an '-O' flag to as a synonym of '-O -inline
 -noboundscheck -release' so people won't miss the extra flags in
 benchmarks. [/joke]

-Ofast sounds better. ;)


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Andrew Wiley
On Tue, Aug 2, 2011 at 1:31 PM, Iain Buclaw ibuc...@ubuntu.com wrote:

 == Quote from KennyTM~ (kenn...@gmail.com)'s article
  On Aug 2, 11 20:00, Jason House wrote:
   The post says they did dmd -O. They did not mention -inline
 -noboundscheck
 -release. There may be extra flags that are required.
  
   Walter Bright Wrote:
  
  
 http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98
  
   Anyone care to examine the assembler output and figure out why?
  
  Let dmd have an '-O' flag to as a synonym of '-O -inline
  -noboundscheck -release' so people won't miss the extra flags in
  benchmarks. [/joke]


-O9001 will make the Redditors happy.


 -Ofast sounds better. ;)



Re: pi benchmark on ldc and dmd

2011-08-02 Thread bearophile
Adam D. Ruppe:

 Here's the program. It's based on one of the Python ones.

The D code is about 2.8 times slower than the Haskell version, and it has a 
bug, shown here:

import std.stdio, std.bigint;
void main() {
int x = 100;
writefln(%010d, x);
BigInt bx = x;
writefln(%010d, bx);
}

Output:
000100
100



The Haskell code I've used:

-- Compile with:  ghc --make -O3 -XBangPatterns -rtsopts pidigits_hs.hs
import System

pidgits n = 0 % (0 # (1,0,1)) where
 i%ds
  | i = n = []
  | True = (concat h ++ \t: ++ show j ++ \n) ++ j%t
  where k = i+10; j = min n k
(h,t) | k  n = (take (n`mod`10) ds ++ replicate (k-n)  ,[])
  | True = splitAt 10 ds
 j # s | na || r+n=d = k # t
 | True = show q : k # (n*10,(a-(q*d))*10,d)
  where k = j+1; t@(n,a,d)=ks; (q,r)=(n*3+a)`divMod`d
 j(n,a,d) = (n*j,(a+n*2)*y,d*y) where y=(j*2+1)

main = putStr.pidgits.read.head = getArgs

Bye,
bearophile


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Walter Bright

On 8/2/2011 12:49 PM, Adam D. Ruppe wrote:

So I'm pretty sure the difference is caused by dmd not using the
new registers in x64. The other differences look trivial to my
eyes.


When I compile it, it uses the registers:

L2E:inc R11
lea R9D,[00h][RSI*2]
mov R9,R9
mov RCX,R11
imulRCX,RSI
mov RSI,RCX
add RDI,R9
add R8,2
mov RDX,R8
imulRDX,RDI
mov RDI,RDX
mov R10,R8
imulR10,RBX
mov RBX,R10
cmp RDI,RSI
jl  L2E
lea RAX,[RCX*2][RCX]
add RAX,RDX
mov -8[RBP],RAX
cqo
idivR10
mov R9,RAX
mov R9,R9
mov RAX,-8[RBP]
cqo
idivR10
mov R12,RDX
mov R12,R12
add R12,RSI
cmp RBX,R12
jle L2E
lea R14,[R14*4][R14]
add R14,R14
add R14,R9
mov R14,R14
inc R13
mov RAX,R13
mov RCX,0Ah
cqo
idivRCX
testRDX,RDX
jne LBD
xor R14,R14
LBD:cmp R13,02710h
jge LE3
mov RDX,RBX
imulRDX,R9
sub RDI,RDX
imulR10D,RDI,0Ah
mov RDI,R10
imulR12D,RSI,0Ah
mov RSI,R12
jmp   L2E

All I did with your example was replace BigInt with long.


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Adam D. Ruppe
Walter Bright wrote:
 All I did with your example was replace BigInt with long.

hmm this is my error, but might be a bug too.

Take that same program and add some inline asm to it.

void main() {
   asm { nop; }
[... the rest is identical ...]
}


Now compile it and check the output. With the asm, I get the
output I posted. If I cut it out, I get what you posted.


My error here is when I did the obj2asm the first time, I added
an instruction inline so I could confirm quickly that I was in
the right place in the file. (I cut that out later but forgot to
rerun obj2asm.)


Re: pi benchmark on ldc and dmd

2011-08-02 Thread simendsjo

On 02.08.2011 22:36, Andrew Wiley wrote:

On Tue, Aug 2, 2011 at 1:31 PM, Iain Buclaw ibuc...@ubuntu.com
mailto:ibuc...@ubuntu.com wrote:

== Quote from KennyTM~ (kenn...@gmail.com
mailto:kenn...@gmail.com)'s article
  On Aug 2, 11 20:00, Jason House wrote:
   The post says they did dmd -O. They did not mention -inline
-noboundscheck
-release. There may be extra flags that are required.
  
   Walter Bright Wrote:
  
  

http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98
  
   Anyone care to examine the assembler output and figure out why?
  
  Let dmd have an '-O' flag to as a synonym of '-O -inline
  -noboundscheck -release' so people won't miss the extra flags in
  benchmarks. [/joke]


-O9001 will make the Redditors happy.


-Ofast sounds better. ;)




How about replacing -w with -9001? 
http://en.wikipedia.org/wiki/ISO_9001#Contents_of_ISO_9001


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Brad Roberts
Ok.. I'm pretty sure that's a bug I discovered the other day in the 
initilization code of asm blocks.  I've already got a fix for it and will 
be sending a pull request shortly.

The asm semantic code calls the 32bit initialization code of the backend 
unconditionally, which is just wrong.

On Tue, 2 Aug 2011, Adam D. Ruppe wrote:

 Walter Bright wrote:
  All I did with your example was replace BigInt with long.
 
 hmm this is my error, but might be a bug too.
 
 Take that same program and add some inline asm to it.
 
 void main() {
asm { nop; }
 [... the rest is identical ...]
 }
 
 
 Now compile it and check the output. With the asm, I get the
 output I posted. If I cut it out, I get what you posted.
 
 
 My error here is when I did the obj2asm the first time, I added
 an instruction inline so I could confirm quickly that I was in
 the right place in the file. (I cut that out later but forgot to
 rerun obj2asm.)
 


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Brad Roberts
https://github.com/D-Programming-Language/dmd/pull/287

Before pulling this, though, the current win32 compilation failure should 
be fixed to avoid compounding problems:

  https://github.com/D-Programming-Language/dmd/pull/288

Later,
Brad

On Tue, 2 Aug 2011, Brad Roberts wrote:

 Ok.. I'm pretty sure that's a bug I discovered the other day in the 
 initilization code of asm blocks.  I've already got a fix for it and will 
 be sending a pull request shortly.
 
 The asm semantic code calls the 32bit initialization code of the backend 
 unconditionally, which is just wrong.
 
 On Tue, 2 Aug 2011, Adam D. Ruppe wrote:
 
  Walter Bright wrote:
   All I did with your example was replace BigInt with long.
  
  hmm this is my error, but might be a bug too.
  
  Take that same program and add some inline asm to it.
  
  void main() {
 asm { nop; }
  [... the rest is identical ...]
  }
  
  
  Now compile it and check the output. With the asm, I get the
  output I posted. If I cut it out, I get what you posted.
  
  
  My error here is when I did the obj2asm the first time, I added
  an instruction inline so I could confirm quickly that I was in
  the right place in the file. (I cut that out later but forgot to
  rerun obj2asm.)
  
 


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Marco Leise

Am 02.08.2011, 22:35 Uhr, schrieb bearophile bearophileh...@lycos.com:


pidgits n = 0 % (0 # (1,0,1)) where
 i%ds
  | i = n = []
  | True = (concat h ++ \t: ++ show j ++ \n) ++ j%t
  where k = i+10; j = min n k
(h,t) | k  n = (take (n`mod`10) ds ++ replicate (k-n)  ,[])
  | True = splitAt 10 ds
 j # s | na || r+n=d = k # t
 | True = show q : k # (n*10,(a-(q*d))*10,d)
  where k = j+1; t@(n,a,d)=ks; (q,r)=(n*3+a)`divMod`d
 j(n,a,d) = (n*j,(a+n*2)*y,d*y) where y=(j*2+1)

main = putStr.pidgits.read.head = getArgs


Is this Indonesian cast to ASCII? :p


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Trass3r
Am 02.08.2011, 22:38 Uhr, schrieb Walter Bright  
newshou...@digitalmars.com:




L2E:inc R11
 lea R9D,[00h][RSI*2]
 mov R9,R9

...

 mov R9,RAX
 mov R9,R9

...

 mov R12,RDX
 mov R12,R12

...

 lea R14,[R14*4][R14]
 add R14,R14
 add R14,R9
 mov R14,R14

...

Any reason for all those mov x,x 's?


Re: pi benchmark on ldc and dmd

2011-08-02 Thread bearophile
Marco Leise:

 Am 02.08.2011, 22:35 Uhr, schrieb bearophile bearophileh...@lycos.com:
 
  pidgits n = 0 % (0 # (1,0,1)) where
   i%ds
| i = n = []
| True = (concat h ++ \t: ++ show j ++ \n) ++ j%t
where k = i+10; j = min n k
  (h,t) | k  n = (take (n`mod`10) ds ++ replicate (k-n)  ,[])
| True = splitAt 10 ds
   j # s | na || r+n=d = k # t
   | True = show q : k # (n*10,(a-(q*d))*10,d)
where k = j+1; t@(n,a,d)=ks; (q,r)=(n*3+a)`divMod`d
   j(n,a,d) = (n*j,(a+n*2)*y,d*y) where y=(j*2+1)
 
  main = putStr.pidgits.read.head = getArgs
 
 Is this Indonesian cast to ASCII? :p

I agree it's very bad looking, it isn't idiomatic Haskell code. But it contains 
nothing too much strange (and the algorithm is the same used in the D code). 
This is formatted a bit better, but I don't fully understand it yet:


import System (getArgs)

pidgits n = 0 % (0 # (1, 0, 1)) where
i % ds
  | i = n = []
  | True = (concat h ++ \t: ++ show j ++ \n) ++ j % t
  where
k = i + 10
j = min n k
(h, t) | k  n = (take (n `mod` 10) ds ++ replicate (k - n)  , [])
   | True = splitAt 10 ds
j # s | n  a || r + n = d = k # t
  | True = show q : k # (n * 10, (a - (q * d)) * 10, d)
where
k = j + 1
t@(n, a, d) = k  s
(q, r) = (n * 3 + a) `divMod` d
j  (n, a, d) = (n * j, (a + n * 2) * y, d * y)
where
y = (j * 2 + 1)

main = putStr . pidgits . read . head = getArgs


The Shootout site (where I have copied that code) ranks programs for the 
performance and their compactness (using a low-performance compressor...), so 
there you see Haskell (and other languages) programs that are sometimes too 
much compact and often use clever tricks to increase their performance. In 
normal Haskell code you don't find those tricks (this specific program seems to 
not use strange tricks, but on the Haskell Wiki page about this problem 
(http://www.haskell.org/haskellwiki/Shootout/Pidigits ) you see several 
programs that are both longer and slower than this one).

The first working implementation of a C program is probably long and fast 
enough, while the first working implementation of a Haskell program is often 
short but not so fast. Usually there are ways to speed up the Haskell code. My 
experience of Haskell is limited, so usually when I write some Haskell my head 
hurts a bit :-)

The higher level nature of Python allows me to implement working algorithms 
that are more complex, so sometimes the code ends being faster than C code, 
where you often avoid (at a first implementation) too much complex algorithms 
for fear of too much hard to find bugs, or just too much long to write 
implementation. Haskell in theory allows you to implement complex algorithms in 
a short space, and safely. In practice I think you need lot of brain to do 
this. Haskell sometimes looks like a puzzle language to me (maybe I just need 
more self-training on functional programming).

Bye,
bearophile


Re: pi benchmark on ldc and dmd

2011-08-02 Thread Walter Bright

On 8/2/2011 3:23 PM, Trass3r wrote:

Any reason for all those mov x,x 's?


No. They'll get removed shortly.

I see three problems with dmd's codegen here:

1. those redundant moves
2. failing to merge a couple divides
3. replacing a mul with an add/lea

I'll see about taking care of them. (2) is the most likely culprit on the speed.


pi benchmark on ldc and dmd

2011-08-01 Thread Walter Bright

http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98

Anyone care to examine the assembler output and figure out why?


Re: pi benchmark on ldc and dmd

2011-08-01 Thread bearophile
Walter:

 http://www.reddit.com/r/programming/comments/j48tf/how_is_c_better_than_d/c29do98
 
 Anyone care to examine the assembler output and figure out why?

Do you mean code similar to this one, or code that uses std.bigint?
http://shootout.alioth.debian.org/debian/program.php?test=pidigitslang=gdcid=3

Bye,
bearophile


Re: pi benchmark on ldc and dmd

2011-08-01 Thread Adam D. Ruppe
bearophile wrote:
 Do you mean code similar to this one, or code that uses std.bigint?

It was something that used bigint. I whipped it up myself earlier
this morning, but left the code on my laptop. I'll post it when
I have a chance.

I ran obj2asm on it myself, but was a little short on time, so
I haven't really analyzed it yet.


Re: pi benchmark on ldc and dmd

2011-08-01 Thread bearophile
Adam D. Ruppe:
 
 It was something that used bigint. I whipped it up myself earlier
 this morning, but left the code on my laptop. I'll post it when
 I have a chance.

OK.
In such situations it's never enough to compare the D code compiled with DMD to 
the D code compiled with LDC. You also need a reference point, like a C version 
compiled with GCC (here using GMP bignums). Such reference points are necessary 
to anchor performance discussions to something.

Bye,
bearophile


Re: pi benchmark on ldc and dmd

2011-08-01 Thread Adam D. Ruppe
bearophile wrote:
 In such situations it's never enough to compare the D code compiled
 with DMD to the D code compiled with LDC. You also need a reference
 point, like a C version compiled with GCC (here using GMP bignums).
 Such reference points are necessary to anchor performance
 discussions to something.

Actually, I don't think that would be relevant here.

The thread started with someone saying the DMD backend is garbage
and should be abandoned.

I'm sick and tired of hearing people say that. The Digital Mars
code has many, many advantages over the others*.

But, it was challenged specifically on the optimizer, so to
check that out, I wanted all other things to be equal.

Same code, same front end, same computer, as close to same runtime
and library is possible with different compilers. The only
difference should be the backend so we can draw conclusions about
it without other factors skewing the results.


So for this, I just wanted to compare dmd backend to ldc and
gdc backend so I didn't worry too much about absolute numbers
or other languages. (Actually, one of the reasons I picked the
pi one was after the embarrassing defeat in floating point, I was
hoping dmd could score a second victory and I could follow up
on that prove it post with satisfaction. Alas, the facts didn't
work out that way. Though, I still do find dmd to beat g++
on a lot of real world code - things like slices actually make
a sizable difference.)

But regardless, it was just about comparing backends, not
doing language comparisons.

===

* To name a huge one. Today was the first time I ever got ldc
or gdc to actually work on my computer, and it took a long, long
time to do it. I've tried in the past, and failed, so this was
a triumph. Big success.

I was waiting over an hour just for gcc+gdc to compile! In the
time it takes for gcc's configure script to run, you can make
clean, build dmd, druntime and phobos.

It's a huge hassle to get the code together too. I had to go
to *four* different sites to get gdc's stuff together (like 80
MB of crap, compressed!), and two different ones to get even the
ldc binary to work. Pain in my ASS.


And this is on Linux too. I pity the fool who tries to do this
on Windows, knowing how so much linux software treats their
Windows ports.


I'd like to contrast to dmd: unzip and play with wild abandon.


Re: pi benchmark on ldc and dmd

2011-08-01 Thread Andrew Wiley
On Mon, Aug 1, 2011 at 8:38 PM, Adam D. Ruppe destructiona...@gmail.comwrote:

 bearophile wrote:
  In such situations it's never enough to compare the D code compiled
  with DMD to the D code compiled with LDC. You also need a reference
  point, like a C version compiled with GCC (here using GMP bignums).
  Such reference points are necessary to anchor performance
  discussions to something.

 Actually, I don't think that would be relevant here.

 The thread started with someone saying the DMD backend is garbage
 and should be abandoned.

 I'm sick and tired of hearing people say that. The Digital Mars
 code has many, many advantages over the others*.

 But, it was challenged specifically on the optimizer, so to
 check that out, I wanted all other things to be equal.

 Same code, same front end, same computer, as close to same runtime
 and library is possible with different compilers. The only
 difference should be the backend so we can draw conclusions about
 it without other factors skewing the results.


 So for this, I just wanted to compare dmd backend to ldc and
 gdc backend so I didn't worry too much about absolute numbers
 or other languages. (Actually, one of the reasons I picked the
 pi one was after the embarrassing defeat in floating point, I was
 hoping dmd could score a second victory and I could follow up
 on that prove it post with satisfaction. Alas, the facts didn't
 work out that way. Though, I still do find dmd to beat g++
 on a lot of real world code - things like slices actually make
 a sizable difference.)

 But regardless, it was just about comparing backends, not
 doing language comparisons.

 ===

 * To name a huge one. Today was the first time I ever got ldc
 or gdc to actually work on my computer, and it took a long, long
 time to do it. I've tried in the past, and failed, so this was
 a triumph. Big success.

 I was waiting over an hour just for gcc+gdc to compile! In the
 time it takes for gcc's configure script to run, you can make
 clean, build dmd, druntime and phobos.

 It's a huge hassle to get the code together too. I had to go
 to *four* different sites to get gdc's stuff together (like 80
 MB of crap, compressed!), and two different ones to get even the
 ldc binary to work. Pain in my ASS.


 And this is on Linux too. I pity the fool who tries to do this
 on Windows, knowing how so much linux software treats their
 Windows ports.


 I'd like to contrast to dmd: unzip and play with wild abandon.


Yes, GDC takes forever and a half to build. That's true of anything in GCC,
and it's just because they don't trust the native C compiler at all. LDC
builds in under a half hour, even on my underpowered ARM SoC, so I don't see
how you could be having trouble there.
As for Windows, Daniel Green (hopefully I'm remembering right) has been
posting GDC binaries.

I do respect that DMD generates reasonably fast executables recklessly fast,
but it also doesn't exist outside x86 and x86_64 and the debug symbols (at
least on Linux) are just hilariously bad.

Now if I could just get GDC to pad structs correctly on ARM...