Re: Why is D slower than LuaJIT?

2013-06-02 Thread Marco Leise
Am Wed, 22 Dec 2010 17:04:21 -0500
schrieb Andreas Mayer s...@bacon.eggs:

 To see what performance advantage D would give me over using a scripting 
 language, I made a small benchmark. It consists of this code:
 
 auto L = iota(0.0, 1000.0);
 auto L2 = map!a / 2(L);
 auto L3 = map!a + 2(L2);
 auto V = reduce!a + b(L3);
 
 It runs in 281 ms on my computer.
 
 The same code in Lua (using LuaJIT) runs in 23 ms.
 
 That's about 10 times faster. I would have expected D to be faster. Did I do 
 something wrong?


Actually D is 1.5 times faster on my computer*:

LDC**    18 ms
GDC***  ===  25 ms
LuaJIT 2.0.0 b7  27 ms
DMD = 93 ms

All compilers based on DMD 2.062 front-end.
* 64-bit Linux, 2.0 Ghz Mobile Core 2 Duo.
** based on LLVM 3.2
*** based on GCC 4.7.2

I modified the iota template to more closely reflect the one
used in the original Lua code: -

import std.algorithm;
import std.stdio;
import std.traits;

auto iota(B, E)(B begin, E end) if (isFloatingPoint!(CommonType!(B, E))) {
alias CommonType!(B, E) Value;
static struct Result
{
private Value start, end;
@property bool empty() const { return start = end; }
@property Value front() const { return start; }
void popFront() { start++; }
}
return Result(begin, end);
}

void main() {
auto L  = iota(0.0, 1000.0),
 L2 = map!(a = a / 2)(L),
 L3 = map!(a = a + 2)(L2),
 V  = reduce!((a, b) = a + b)(L3);

writefln(%f, V);
}

-- 
Marco



Re: Why is D slower than LuaJIT?

2010-12-24 Thread bearophile
spir:

 Note: Iota is right-side exclusive like i..j . (I've just been caught by this 
 trap ;-)

This is for the better, to increase language consistency (as in Python).
In APL (where the iota name comes from) the semantics was different:
i 5 == 1, 2, 3, 4, 5

Integer ranges are very common, so some generalization seems useful here. The 
x..y:z syntax may be the range with z stride, and be syntax sugar for iota(x, 
y, z).

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Lutger Blijdestijn
I meant to link this, it includes all benchmarks and ranks gdc at 5th place 
and dmd at 8 (from 2008):

http://shootout.alioth.debian.org/debian/benchmark.php?test=alllang=all


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Lutger Blijdestijn
Andreas Mayer wrote:

 Walter Bright Wrote:
 
 I notice you are using doubles in D. dmd currently uses the x87 to
 evaluate doubles, and on some processors the x87 is slow relative to
 using the XMM instructions. Also, dmd's back end doesn't align the
 doubles on 16 byte boundaries, which can also slow down the floating
 point on some processors.
 
 Using long instead of double, it is still slower than LuaJIT (223 ms on my
 machine). Even with int it still takes 101 ms and is at least 3x slower
 than LuaJIT.
 
 Both of these code gen issues with dmd are well known, and I'd like to
 solve them after we address higher priority issues.
 
 If it's not clear, I'd like to emphasize that these are compiler issues,
 not D language issues.
 
 I shouldn't use D now? How long until it is ready?

You may want to explore the great language shootout before drawing that 
conclusion:

http://shootout.alioth.debian.org/

LuaJit ranks high there, but still a bit below the fastest compiled 
languages (and the fastest java). D is not included anymore, but it once was 
and these benchmarks can still be found:

http://shootout.alioth.debian.org/debian/performance.php

LuaJit performance is impressive, far above any 'scripting' language. Just 
look at some numbers in the shootout comparing it to ruby or python.





Re: Why is D slower than LuaJIT?

2010-12-23 Thread bearophile
Andrei:

 I'm thinking what to do about iota, which has good features but exacts 
 too much cost on tight loop performance. One solution would be to define 
 iota to be the simple, forward range that I defined as Iota2 in my 
 previous post. Then, we need a different name for the full-fledged iota 
 (random-access, has known length, iterates through the same numbers 
 forward and backward etc). Ideas?

Is improving the compiler instead an option?

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Pelle Månsson

On 12/22/2010 11:04 PM, Andreas Mayer wrote:

To see what performance advantage D would give me over using a scripting 
language, I made a small benchmark. It consists of this code:


auto L = iota(0.0, 1000.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);


It runs in 281 ms on my computer.

The same code in Lua (using LuaJIT) runs in 23 ms.

That's about 10 times faster. I would have expected D to be faster. Did I do 
something wrong?

The first Lua version uses a simplified design. I thought maybe that is unfair 
to ranges, which are more complicated. You could argue ranges have more 
features and do more work. To make it fair, I made a second Lua version of the 
above benchmark that emulates ranges. It is still 29 ms fast.

The full D version is here: http://pastebin.com/R5AGHyPx
The Lua version: http://pastebin.com/Sa7rp6uz
Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

Could someone help me solving this mystery?

Or is D, unlike I thought, not suitable for high performance computing? What 
should I do?



I changed the code to this:

auto L = iota(0, 1000);
auto L2 = map!a / 2.0(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);

and ripped the caching out of std.algorithm.map. :-)

This made it go from about 1.4 seconds to about 0.4 seconds on my 
machine. Note that I did no rigorous or scientific testing.


Also, if you really really need the performance you can change it all to 
lower level code, should you want to.


Re: Why is D slower than LuaJIT?

2010-12-23 Thread spir
On Wed, 22 Dec 2010 20:16:45 -0600
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:

 Thanks for posting the numbers. That's a long time, particularly 
 considering that the two map instances don't do anything. So the bulk of 
 the computation is:
 
 auto L = iota(0.0, 1000.0);
 auto V = reduce!a + b(L3);
 
 There is one inherent problem that affects the speed of iota: in iota, 
 the value at position i is computed as 0.0 + i * step, where step is 
 computed from the limits. That's one addition and a multiplication for 
 each pass through iota. Given that the actual workload of the loop is 
 only one addition, we are doing a lot more work. I suspect that that's 
 the main issue there.
 
 The reason for which iota does that instead of the simpler increment is 
 that iota must iterate the same values forward and backward. Using ++ 
 may interact with floating-point vagaries, so the code is currently 
 conservative.

There is a point I don't understand here: Iota is a range-struct template, with
void popFront()
{
current += step;
}
So, how does the computation of an arbitrary element at a given index affect 
looping speed? For mappings (and any kind of traversal, indeed), there should 
be an addition per element. Else, why define a range interface at all? What do 
I miss?

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-23 Thread Simen kjaeraas

spir denis.s...@gmail.com wrote:

There is a point I don't understand here: Iota is a range-struct  
template, with

void popFront()
{
current += step;
}
So, how does the computation of an arbitrary element at a given index  
affect looping speed? For mappings (and any kind of traversal, indeed),  
there should be an addition per element. Else, why define a range  
interface at all? What do I miss?


With floating-point numbers, the above solution does not always work. If
step == 1, increasing current by step amount will stop working at some
point, at which the range will then grind to a halt. If instead one
multiplies step by the current number of steps taken, and adds to the
origin, this problem disappears.

As an example of when this problem shows up, try this code:

float f = 16_777_216;
auto f2 = f + 1;
assert( f == f2 );

The assert passes.

--
Simen


Re: Why is D slower than LuaJIT?

2010-12-23 Thread bearophile
Simen kjaeraas:

 With floating-point numbers, the above solution does not always work.

The type is known at compile time, so you can split the algorithm in two with a 
static if, and do something else if it's an integral type.

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-23 Thread spir
On Wed, 22 Dec 2010 22:14:34 -0600
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:

 I then replaced iota's implementation with a simpler one that's a 
 forward range. Then the performance became exactly the same as for the 
 simple loop.


After having watched Iota's very general implementation, I tried the same 
change, precisely. Actually, with an even simpler range requiring a single 
element type for (first,last,step). For any reason, this alternative is 
slightly slower by me than using Iota (don't cry watching absolute times, my 
computer is old and slow ;-). Sample code below, typical results are:

1.1 3.3 5.5 7.7 
Interval time: 1149
Iota time: 1066

Note: adding an assert to ensure front or popfront is not wrongly called past 
the end adds ~ 20% time.
Note: I think this demonstates that using Iota does not perform undue 
computations (multiplication to get Nth element with multiplication + 
addition), or do I misinterpret?

Anyway, what is wrong in my code? What doesn't it perform better?


import std.algorithm: map, filter, reduce;
import std.range: iota;
struct Interval (T) {
alias T Element;
Element first, last, step;
private Element element;
this (Element first, Element last, Element step=1) {
this.first = first;
this.last = last;
this.step = step;
this.element = first;
}
@property void popFront () {
this.element += this.step;
}
@property bool empty () {
return (this.element  this.last);
}
@property Element front () {
return this.element;
}
}
void main () {
auto nums = Interval!float(1.1,8.8, 2.2);
foreach(n ; nums) writef(%s , n);
writeln();

auto t1 = time();
auto nums1 = Interval!int(0, 10_000_000);
auto halves1 = map!a/2(nums1);
auto incs1 = map!a+2(halves1);
auto result1 = reduce!a+b(incs1);
writefln(Interval time: %s, time() - t1);

auto t2 = time();
auto nums2 = iota(0, 10_000_000);
auto halves2 = map!a/2(nums2);
auto incs2 = map!a+2(halves2);
auto result2 = reduce!a+b(incs2);
writefln(Iota time: %s, time() - t2);
}


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-23 Thread spir
On Wed, 22 Dec 2010 23:22:56 -0600
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:

 I'm thinking what to do about iota, which has good features but exacts 
 too much cost on tight loop performance. One solution would be to define 
 iota to be the simple, forward range that I defined as Iota2 in my 
 previous post. Then, we need a different name for the full-fledged iota 
 (random-access, has known length, iterates through the same numbers 
 forward and backward etc). Ideas?

I would keep length and add an opIn: if (e in interval) {...}. (I'm unsure 
whether it's worth allowing different types for bounds and/or for step; I'd 
rather make things simple.) Then, you could call it Interval, what do you think?

Note: The result would be very similar to python (x)ranges. D has a notation 
for a slightly narrower notion: '..'. Thus, what about:
Interval!int interval = 1..9;
or else:
auto interval = Interval!int(1..9);
?

What kind of thingie does i..j actually construct as of now?

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-23 Thread Jonathan M Davis
On Thursday 23 December 2010 05:22:55 spir wrote:
 On Wed, 22 Dec 2010 23:22:56 -0600
 
 Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:
  I'm thinking what to do about iota, which has good features but exacts
  too much cost on tight loop performance. One solution would be to define
  iota to be the simple, forward range that I defined as Iota2 in my
  previous post. Then, we need a different name for the full-fledged iota
  (random-access, has known length, iterates through the same numbers
  forward and backward etc). Ideas?
 
 I would keep length and add an opIn: if (e in interval) {...}. (I'm unsure
 whether it's worth allowing different types for bounds and/or for step;
 I'd rather make things simple.) Then, you could call it Interval, what do
 you think?
 
 Note: The result would be very similar to python (x)ranges. D has a
 notation for a slightly narrower notion: '..'. Thus, what about:
 Interval!int interval = 1..9;
 or else:
   auto interval = Interval!int(1..9);
 ?
 
 What kind of thingie does i..j actually construct as of now?

I believe that the only place that .. works is within []. If an object 
overrides 
an opSlice() which takes parameters, then that syntax can be used. I don't 
believe that it works on its own at all.

- Jonathan M Davis


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Simen kjaeraas

bearophile bearophileh...@lycos.com wrote:


Simen kjaeraas:


With floating-point numbers, the above solution does not always work.


The type is known at compile time, so you can split the algorithm in two  
with a static if, and do something else if it's an integral type.


Absolutely. However, in this example doubles were used.

Also, though it may be doable for integers, other types may also want
that optimization (BigInt comes to mind). A behavesAsIntegral!T might
fix that.

--
Simen


Re: Why is D slower than LuaJIT?

2010-12-23 Thread spir
On Thu, 23 Dec 2010 05:29:32 -0800
Jonathan M Davis jmdavisp...@gmx.com wrote:

 On Thursday 23 December 2010 05:22:55 spir wrote:
  On Wed, 22 Dec 2010 23:22:56 -0600
  
  Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:
   I'm thinking what to do about iota, which has good features but exacts
   too much cost on tight loop performance. One solution would be to define
   iota to be the simple, forward range that I defined as Iota2 in my
   previous post. Then, we need a different name for the full-fledged iota
   (random-access, has known length, iterates through the same numbers
   forward and backward etc). Ideas?
  
  I would keep length and add an opIn: if (e in interval) {...}. (I'm unsure
  whether it's worth allowing different types for bounds and/or for step;
  I'd rather make things simple.) Then, you could call it Interval, what do
  you think?
  
  Note: The result would be very similar to python (x)ranges. D has a
  notation for a slightly narrower notion: '..'. Thus, what about:
  Interval!int interval = 1..9;
  or else:
  auto interval = Interval!int(1..9);
  ?
  
  What kind of thingie does i..j actually construct as of now?
 
 I believe that the only place that .. works is within []. If an object 
 overrides 
 an opSlice() which takes parameters, then that syntax can be used.

;-) There's also
foreach(n ; i..j) {...}
Precisely, that's what I was thinking at when stating that D has a notation for 
a very close (but narrower) notion. Slicing is related, but much farther since 
it does not necessarily resuire iteration (bit it's result does allow it).
Note: Iota is right-side exclusive like i..j . (I've just been caught by this 
trap ;-)

  I don't believe that it works on its own at all.

Certainly not. This would be a syntactic addition. The reason why I asked what 
i..j currently yield --if it yield anything (could just rewrite).


denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-23 Thread Simen kjaeraas

spir denis.s...@gmail.com wrote:


On Wed, 22 Dec 2010 23:22:56 -0600
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:


I'm thinking what to do about iota, which has good features but exacts
too much cost on tight loop performance. One solution would be to define
iota to be the simple, forward range that I defined as Iota2 in my
previous post. Then, we need a different name for the full-fledged iota
(random-access, has known length, iterates through the same numbers
forward and backward etc). Ideas?


I would keep length and add an opIn: if (e in interval) {...}. (I'm  
unsure whether it's worth allowing different types for bounds and/or for  
step; I'd rather make things simple.) Then, you could call it Interval,  
what do you think?


Note: The result would be very similar to python (x)ranges. D has a  
notation for a slightly narrower notion: '..'. Thus, what about:

Interval!int interval = 1..9;
or else:
auto interval = Interval!int(1..9);
?

What kind of thingie does i..j actually construct as of now?


Nothing. The syntax only works in foreach and opSlice.

However, this works:

final abstract class Intervals {
struct Interval( T ) {
T start, end;
}
static Interval!T opSlice( T )( T start, T end ) {
return Interval!T( start, end );
}
}

auto intInterval = Intervals[1..2];
auto stringInterval = Intervals[foo..bar];



--
Simen


Re: Why is D slower than LuaJIT?

2010-12-23 Thread spir
On Thu, 23 Dec 2010 14:40:13 +0100
Simen kjaeraas simen.kja...@gmail.com wrote:

  What kind of thingie does i..j actually construct as of now?  
 
 Nothing. The syntax only works in foreach and opSlice.
 
 However, this works:
 
 final abstract class Intervals {
  struct Interval( T ) {
  T start, end;
  }
  static Interval!T opSlice( T )( T start, T end ) {
  return Interval!T( start, end );
  }
 }
 
 auto intInterval = Intervals[1..2];
 auto stringInterval = Intervals[foo..bar];

Nice! (even impressive :-)

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/22/10 11:40 PM, Brad Roberts wrote:

Since the timing code isn't here, I'm assuming you guys are doing the
testing around the whole app.  While that might be interesting, it's
hiding an awfully large and important difference, application startup
time.

C has very little, D quite a bit more, and I don't know what Lua looks
like there.  If the goal is to test this math code, you'll need to
separate the two.

At this point, I highly suspect you're really measuring the runtime costs.


One thing I didn't mention is that I also measured with 10x the counter 
limit. That brings the run time to seconds, and the relative difference 
persists. So application startup time is negligible in this case.


Andrei


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/23/10 2:52 AM, bearophile wrote:

Andrei:


I'm thinking what to do about iota, which has good features but exacts
too much cost on tight loop performance. One solution would be to define
iota to be the simple, forward range that I defined as Iota2 in my
previous post. Then, we need a different name for the full-fledged iota
(random-access, has known length, iterates through the same numbers
forward and backward etc). Ideas?


Is improving the compiler instead an option?


It's more of a separate matter than an option. Iota currently does a 
fair amount of work for floating-point types, and a contemporary 
optimizer cannot be reasonably expected to simplify that code.


Andrei



Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/22/10 8:16 PM, Andrei Alexandrescu wrote:

On 12/22/10 4:04 PM, Andreas Mayer wrote:

To see what performance advantage D would give me over using a
scripting language, I made a small benchmark. It consists of this code:


auto L = iota(0.0, 1000.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);


It runs in 281 ms on my computer.


Thanks for posting the numbers. That's a long time, particularly
considering that the two map instances don't do anything. So the bulk of
the computation is:

auto L = iota(0.0, 1000.0);
auto V = reduce!a + b(L3);


Oops, I was wrong. The two instances of map do something, I thought 
they're all applied to L when in fact they are chained. So my estimates 
are incorrect. At any rate, clearly iota incurs a 2x cost, which 
probably composes with other similar costs incurred by map.


Andrei


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/23/10 6:57 AM, bearophile wrote:

Simen kjaeraas:


With floating-point numbers, the above solution does not always work.


The type is known at compile time, so you can split the algorithm in two with a 
static if, and do something else if it's an integral type.


That's what the code currently does.

Andrei


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/23/10 7:04 AM, spir wrote:

On Wed, 22 Dec 2010 22:14:34 -0600
Andrei Alexandrescuseewebsiteforem...@erdani.org  wrote:


I then replaced iota's implementation with a simpler one that's a
forward range. Then the performance became exactly the same as for the
simple loop.



After having watched Iota's very general implementation, I tried the same 
change, precisely. Actually, with an even simpler range requiring a single 
element type for (first,last,step). For any reason, this alternative is 
slightly slower by me than using Iota (don't cry watching absolute times, my 
computer is old and slow ;-). Sample code below, typical results are:

1.1 3.3 5.5 7.7
Interval time: 1149
Iota time: 1066

Note: adding an assert to ensure front or popfront is not wrongly called past 
the end adds ~ 20% time.


I cut my losses reading here :o). No performance test is meaningful 
without all optimizations turned on.


Andrei


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/23/10 6:22 AM, spir wrote:

On Wed, 22 Dec 2010 20:16:45 -0600
Andrei Alexandrescuseewebsiteforem...@erdani.org  wrote:


Thanks for posting the numbers. That's a long time, particularly
considering that the two map instances don't do anything. So the bulk of
the computation is:

auto L = iota(0.0, 1000.0);
auto V = reduce!a + b(L3);

There is one inherent problem that affects the speed of iota: in iota,
the value at position i is computed as 0.0 + i * step, where step is
computed from the limits. That's one addition and a multiplication for
each pass through iota. Given that the actual workload of the loop is
only one addition, we are doing a lot more work. I suspect that that's
the main issue there.

The reason for which iota does that instead of the simpler increment is
that iota must iterate the same values forward and backward. Using ++
may interact with floating-point vagaries, so the code is currently
conservative.


There is a point I don't understand here: Iota is a range-struct template, with
 void popFront()
 {
 current += step;
 }


You need to look at this specialization:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L3800

and keep in mind Simen's explanation.


Andrei



Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/22/10 4:04 PM, Andreas Mayer wrote:

To see what performance advantage D would give me over using a scripting 
language, I made a small benchmark. It consists of this code:


auto L = iota(0.0, 1000.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);


It runs in 281 ms on my computer.

The same code in Lua (using LuaJIT) runs in 23 ms.

That's about 10 times faster. I would have expected D to be faster. Did I do 
something wrong?

The first Lua version uses a simplified design. I thought maybe that is unfair 
to ranges, which are more complicated. You could argue ranges have more 
features and do more work. To make it fair, I made a second Lua version of the 
above benchmark that emulates ranges. It is still 29 ms fast.

The full D version is here: http://pastebin.com/R5AGHyPx
The Lua version: http://pastebin.com/Sa7rp6uz
Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

Could someone help me solving this mystery?

Or is D, unlike I thought, not suitable for high performance computing? What 
should I do?


I wrote a new test bench and got 41 ms for the baseline and 220 ms for 
the code based on map and iota. (Surprisingly, the extra work didn't 
affect the run time, which suggests the loop is dominated by the counter 
increment and test.) Then I took out the cache in map and got 136 ms. 
Finally, I replaced the use of iota with iota2 and got performance equal 
to that of handwritten code. Code below.


I decided to check in the map cache removal. We discussed it a fair 
amount among Phobos devs. I have no doubts caching might help in certain 
cases, but it does lead to surprising performance loss for simple cases 
like the one tested here. See 
http://www.dsource.org/projects/phobos/changeset/2231


If the other Phobos folks approve, I'll also specialize iota for 
floating point numbers to be a forward range and defer the decision on 
defining a randomAccessIota for floating point numbers to later. That 
would complete the library improvements pointed to by this test, leaving 
further optimization to compiler improvements. Thanks Andreas for 
starting this.



Andrei

import std.algorithm;
import std.stdio;
import std.range;
import std.traits;

struct Iota2(N, S) if (isFloatingPoint!N  isNumeric!S) {
private N start, end, current;
private S step;
this(N start, N end, S step)
{
this.start = start;
this.end = end;
this.step = step;
current = start;
}
/// Range primitives
@property bool empty() const { return current = end; }
/// Ditto
@property N front() { return current; }
/// Ditto
alias front moveFront;
/// Ditto
void popFront()
{
assert(!empty);
current += step;
}
@property Iota2 save() { return this; }
}

auto iota2(B, E, S)(B begin, E end, S step)
if (is(typeof((E.init - B.init) + 1 * S.init)))
{
return Iota2!(CommonType!(Unqual!B, Unqual!E), S)(begin, end, step);
}

void main(string args[]) {
 double result;
 auto limit = 10_000_000.0;
 if (args.length  1) {
writeln(iota);
auto L = iota2(0.0, limit, 1.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
result = reduce!a + b(L3);
} else {
writeln(baseline);
result = 0.0;
for (double i = 0; i != limit; ++i) {
result += (i / 2) + 2;
}
}
writefln(%f, result);
}


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Simen kjaeraas

Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:


http://www.dsource.org/projects/phobos/changeset/2231


BTW, shouldn't range constructors call .save for forward ranges? This
one certainly doesn't.

--
Simen


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/23/10 10:09 AM, Simen kjaeraas wrote:

Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:


I decided to check in the map cache removal. We discussed it a fair
amount among Phobos devs. I have no doubts caching might help in
certain cases, but it does lead to surprising performance loss for
simple cases like the one tested here. See
http://www.dsource.org/projects/phobos/changeset/2231


It seems to me that having a Cached range might be a better, more general
solution in any case.


Agreed.

Andrei


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Andrei Alexandrescu

On 12/23/10 10:14 AM, Simen kjaeraas wrote:

Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:


http://www.dsource.org/projects/phobos/changeset/2231


BTW, shouldn't range constructors call .save for forward ranges? This
one certainly doesn't.


Currently higher-order ranges assume that the range passed-in is good to 
take ownership of. A range or algorithm should call save only in case 
extra copies need to be created.


Andrei


Re: Why is D slower than LuaJIT?

2010-12-23 Thread Jimmy Cao
I hope that in the future more implementations in D can be compared for
performance against their equivalent Lua translations.
It seems that LuaJIT is a super speedy dynamic language, and it is
specifically designed to break into the performance ranges of optimized
static languages, which makes it a formidable competitor.


Why is D slower than LuaJIT?

2010-12-22 Thread Andreas Mayer
To see what performance advantage D would give me over using a scripting 
language, I made a small benchmark. It consists of this code:

auto L = iota(0.0, 1000.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);

It runs in 281 ms on my computer.

The same code in Lua (using LuaJIT) runs in 23 ms.

That's about 10 times faster. I would have expected D to be faster. Did I do 
something wrong?

The first Lua version uses a simplified design. I thought maybe that is unfair 
to ranges, which are more complicated. You could argue ranges have more 
features and do more work. To make it fair, I made a second Lua version of the 
above benchmark that emulates ranges. It is still 29 ms fast.

The full D version is here: http://pastebin.com/R5AGHyPx
The Lua version: http://pastebin.com/Sa7rp6uz
Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

Could someone help me solving this mystery?

Or is D, unlike I thought, not suitable for high performance computing? What 
should I do?



Re: Why is D slower than LuaJIT?

2010-12-22 Thread Trass3r
Or is D, unlike I thought, not suitable for high performance computing?  
What should I do?


LuaJIT seems to have a really good backend.
You better compare with ldc or gdc.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Steven Schveighoffer

On Wed, 22 Dec 2010 17:04:21 -0500, Andreas Mayer s...@bacon.eggs wrote:

To see what performance advantage D would give me over using a scripting  
language, I made a small benchmark. It consists of this code:



   auto L = iota(0.0, 1000.0);
   auto L2 = map!a / 2(L);
   auto L3 = map!a + 2(L2);
   auto V = reduce!a + b(L3);


It runs in 281 ms on my computer.

The same code in Lua (using LuaJIT) runs in 23 ms.

That's about 10 times faster. I would have expected D to be faster. Did  
I do something wrong?


The first Lua version uses a simplified design. I thought maybe that is  
unfair to ranges, which are more complicated. You could argue ranges  
have more features and do more work. To make it fair, I made a second  
Lua version of the above benchmark that emulates ranges. It is still 29  
ms fast.


The full D version is here: http://pastebin.com/R5AGHyPx
The Lua version: http://pastebin.com/Sa7rp6uz
Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

Could someone help me solving this mystery?

Or is D, unlike I thought, not suitable for high performance computing?  
What should I do?


Without any imperical testing, I would guess this has something to do with  
the lack of inlining for algorithmic functions.  This is due primarily to  
uses of enforce, which use lazy parameters, which are currently not  
inlinable (also, ensure you use -O -release -inline for the most optimized  
code).


I hope that someday this is solved, because it doesn't look very good for  
D...


-Steve


Re: Why is D slower than LuaJIT?

2010-12-22 Thread BLS

On 22/12/2010 23:06, Steven Schveighoffer wrote:

(also, ensure you use -O -release -inline for the most optimized code).


quote...
// D version, with std.algorithm
// ~ 281 ms, using dmd 2.051 (dmd -O -release -inline)

Sometimes having a look at the source helps :)



Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andreas Mayer
Trass3r Wrote:

 LuaJIT seems to have a really good backend.
 You better compare with ldc or gdc.

Maybe someone could do that for me? I don't have ldc or gdc here. There are 
some Debian packages, but they are D version 1 only? 


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Gary Whatmore
Andreas Mayer Wrote:

 To see what performance advantage D would give me over using a scripting 
 language, I made a small benchmark. It consists of this code:
 
 auto L = iota(0.0, 1000.0);
 auto L2 = map!a / 2(L);
 auto L3 = map!a + 2(L2);
 auto V = reduce!a + b(L3);

First note: this is a synthetic toy benchmark. Take it with a grain of salt. It 
represent in no way the true state of D.

 
 It runs in 281 ms on my computer.
 
 The same code in Lua (using LuaJIT) runs in 23 ms.

Your mp3 player or file system was doing stuff while executing the benchmark. 
You probably don't know how to run the test many times and use the 
average/minimum result for both languages. For example D does not have JIT 
startup cost so take the minimum result for D, JIT has varying startup speed so 
take the average or slowest result for Luajit. Compare these. More fair for 
native code D.

 That's about 10 times faster. I would have expected D to be faster. Did I do 
 something wrong?
 
 The first Lua version uses a simplified design. I thought maybe that is 
 unfair to ranges, which are more complicated. You could argue ranges have 
 more features and do more work. To make it fair, I made a second Lua version 
 of the above benchmark that emulates ranges. It is still 29 ms fast.
 
 The full D version is here: http://pastebin.com/R5AGHyPx
 The Lua version: http://pastebin.com/Sa7rp6uz
 Lua version that emulates ranges: http://pastebin.com/eAKMSWyr
 
 Could someone help me solving this mystery?

My guesses are:

1) you didn't even test this and didn't use optimizations. - User error
2) whenever doing benchmarks you must compare the competing stuff against all D 
compilers, cut and paste the object code of different compilers and manually 
build the fastest executable.
3) you didn't use inline assembler or profiler for D
4) you were using unstable Phobos functions. There is no doubt the final Phobos 
2.0 will beat Luajit. D *is* a compiler statical language, Luajit just a joke.
5) you were using old d runtime garbage collector. One fellow here made a 
precise state of the art GC which beats even Java's 20 year old GC and C#. 
Patch your dmd to use this instead.

Not intending to start a religious war but if your native code runs slower than 
*JIT* code, you're doing something wrong. D will always beat JIT. Lua is also a 
joke language, D is for high performance servers and operating systems. In the 
worst case, disassemble the luajit program, steal its codes and write it using 
inline assembler in D. D must win these performance battles. It's technically 
superior.

 
 Or is D, unlike I thought, not suitable for high performance computing? What 
 should I do?

It is. D a) takes time to mature and b) you didn't fully utilize the compiler.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Iain Buclaw
== Quote from Andreas Mayer (s...@bacon.eggs)'s article
 To see what performance advantage D would give me over using a scripting
language, I made a small benchmark. It consists of this code:

One corner case doesn't amount to an end-all be-all proof to anything.

 auto L = iota(0.0, 1000.0);
 auto L2 = map!a / 2(L);
 auto L3 = map!a + 2(L2);
 auto V = reduce!a + b(L3);
 It runs in 281 ms on my computer.
 The same code in Lua (using LuaJIT) runs in 23 ms.
 That's about 10 times faster. I would have expected D to be faster. Did I do
something wrong?
 The first Lua version uses a simplified design. I thought maybe that is unfair
to ranges, which are more complicated. You could argue ranges have more features
and do more work. To make it fair, I made a second Lua version of the above
benchmark that emulates ranges. It is still 29 ms fast.

As has been already echoed, the lack of inlining algorithmic functions may be 
one
reason for the added cost to runtime. Another may be simply that there is a lot
more going on behind the scenes than what you give credit for in D.

Regards :~)


Re: Why is D slower than LuaJIT?

2010-12-22 Thread BLS

On 22/12/2010 23:31, Gary Whatmore wrote:

Not intending to start a religious war but if your native code runs slower 
than*JIT*  code, you're doing something wrong. D will always beat JIT.


You talk like a prayer, don't you ? No need to measure.  I believe ...

Anyway I don't care about LUA. The dynamic ASM generation on the other 
hand is something which should be seriously be considered for D 
templates. Why ? What comes immediately in mind..no more Bloated 
executables.

my 2 euro cents


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Ary Borenszweig
Lua is a proven, robust language

Lua has been used in many industrial applications (e.g., Adobe's
Photoshop Lightroom), with an emphasis on embedded systems (e.g., the
Ginga middleware for digital TV in Brazil) and games (e.g., World of
Warcraft). Lua is currently the leading scripting language in games.
Lua has a solid reference manual and there are several books about it.
Several versions of Lua have been released and used in real
applications since its creation in 1993. Lua featured in HOPL III, the
Third ACM SIGPLAN History of Programming Languages Conference, in June
2007.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread spir
On Wed, 22 Dec 2010 17:04:21 -0500
Andreas Mayer s...@bacon.eggs wrote:

 To see what performance advantage D would give me over using a scripting 
 language, I made a small benchmark. It consists of this code:
 
 auto L = iota(0.0, 1000.0);
 auto L2 = map!a / 2(L);
 auto L3 = map!a + 2(L2);
 auto V = reduce!a + b(L3);
 
 It runs in 281 ms on my computer.
 
 The same code in Lua (using LuaJIT) runs in 23 ms.
 
 That's about 10 times faster. I would have expected D to be faster. Did I do 
 something wrong?
 
 The first Lua version uses a simplified design. I thought maybe that is 
 unfair to ranges, which are more complicated. You could argue ranges have 
 more features and do more work. To make it fair, I made a second Lua version 
 of the above benchmark that emulates ranges. It is still 29 ms fast.
 
 The full D version is here: http://pastebin.com/R5AGHyPx
 The Lua version: http://pastebin.com/Sa7rp6uz
 Lua version that emulates ranges: http://pastebin.com/eAKMSWyr
 
 Could someone help me solving this mystery?
 
 Or is D, unlike I thought, not suitable for high performance computing? What 
 should I do?

Dunno why D seems slow. But Lua is a very fast dynamic language, very simple  
rather low level (both on language  implementation sides). Benchmark trials in 
Lua often run much faster than python or ruby equivalents (often 10 X). 
Depending on the domain, LuaJIT often adds a speed factor of an order of 
magnitude. This alltogether brings comparable performance to some compiled 
languages using high-level features such as virtual funcs, GC,... higher-order 
funcs, ranges ;-)

denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-22 Thread Gary Whatmore
Ary Borenszweig Wrote:

 Lua is a proven, robust language
 
 Lua has been used in many industrial applications (e.g., Adobe's
 Photoshop Lightroom), with an emphasis on embedded systems (e.g., the
 Ginga middleware for digital TV in Brazil) and games (e.g., World of
 Warcraft). Lua is currently the leading scripting language in games.
 Lua has a solid reference manual and there are several books about it.
 Several versions of Lua have been released and used in real
 applications since its creation in 1993. Lua featured in HOPL III, the
 Third ACM SIGPLAN History of Programming Languages Conference, in June
 2007.

Are you suggesting D's conferences have smaller publicity value than ACM 
SIGPLAN HOPL? D has had several international conferences (videos available 
online) and the ACCU fellowship likes D. The NWCPP group likes D. I find to 
hard to believe D wouldn't replace Lua everywhere, when the time comes.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Adam D. Ruppe
Steven Schveighoffer wrote:
  I would guess this has something to do with
 the lack of inlining for algorithmic functions.

Yeah, this is almost certainly the problem. I rewrote the
code using a traditional C style loop, no external functions,
and I'm getting roughly equal performance.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Ary Borenszweig
You said Lua is a joke language. It doesn't seem to be one...


Re: Why is D slower than LuaJIT?

2010-12-22 Thread loser
Adam D. Ruppe Wrote:

 Steven Schveighoffer wrote:
   I would guess this has something to do with
  the lack of inlining for algorithmic functions.
 
 Yeah, this is almost certainly the problem. I rewrote the
 code using a traditional C style loop, no external functions,
 and I'm getting roughly equal performance.

So is it justified enough to throw my W's incompetence card on the table at 
this point? How else it is possible that a simple scripting language with 
simple JIT optimization heuristics can outperform a performance oriented 
systems programming language. It seems most D design decisions are based on the 
perceived performance value (not as aggressively as in C++ groups). I'd like to 
see how this theory doesn't hold water now?


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Gary Whatmore
Ary Borenszweig Wrote:

 You said Lua is a joke language. It doesn't seem to be one...

Okay then, maybe it's not completely true. I meant it doesn't work in large 
scale applications unlike a static systems programming language. Need to study 
how extensively it's used in that game. I just tihkn being popular doesn't mean 
it's not a joke. Think about PHP !


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andreas Mayer
Gary Whatmore Wrote:

 Andreas Mayer Wrote:
 
  To see what performance advantage D would give me over using a scripting 
  language, I made a small benchmark. It consists of this code:
  
  auto L = iota(0.0, 1000.0);
  auto L2 = map!a / 2(L);
  auto L3 = map!a + 2(L2);
  auto V = reduce!a + b(L3);
 
 First note: this is a synthetic toy benchmark. Take it with a grain of salt. 
 It represent in no way the true state of D.

True enough. Yet it doesn't make me very optimistic what the final performance 
tradeoff would be as long as you use high level abstractions. Sure, with D you 
can always go on the C or even assembly level.

 Your mp3 player or file system was doing stuff while executing the benchmark. 
 You probably don't know how to run the test many times and use the 
 average/minimum result for both languages. For example D does not have JIT 
 startup cost so take the minimum result for D, JIT has varying startup speed 
 so take the average or slowest result for Luajit. Compare these. More fair 
 for native code D.

Both benchmarks were run under the same conditions. Once the executables were 
inside the disk cache, the run times didn't vary much. Plus this benchmark 
already is unfair against LuaJIT: the startup time and the time needed for 
optimization and code generation are included in the times I gave. The D 
example on the other hand doesn't include the time needed for compilation. The 
D compiler needs 360 ms to compile this example. If the comparison were fair 
and included compilation time in the D timings, D would lose even more.

 My guesses are:
 
 1) you didn't even test this and didn't use optimizations. - User error

I enabled all dmd optimizations I was aware of. Maybe I forgot some?

 2) whenever doing benchmarks you must compare the competing stuff against all 
 D compilers, cut and paste the object code of different compilers and 
 manually build the fastest executable.

That seems like an unreasonable task. Writing the code in assembler would be 
simpler. But I'm using a high level language because I want to use high level 
abstractions. Like map and reduce, instead of writing assembler.

 3) you didn't use inline assembler or profiler for D

See 2).

 4) you were using unstable Phobos functions. There is no doubt the final 
 Phobos 2.0 will beat Luajit. D *is* a compiler statical language, Luajit just 
 a joke.

I used the latest dmd release (and that is very new).

As you can see, LuaJIT beats D by far. I wouldn't call it a joke.

If a joke beats D, then what is D? This way of argumentation doesn't sound very 
advantageous for you.

 5) you were using old d runtime garbage collector. One fellow here made a 
 precise state of the art GC which beats even Java's 20 year old GC and C#. 
 Patch your dmd to use this instead.

There shouldn't be any GC activity. Ranges work lazily. They don't allocate 
arrays for the data they are working on.

You can post a package with bleeding edge dmd and Phobos sources with updated 
GC and so on. Then I could try that.

 
 Not intending to start a religious war but if your native code runs slower 
 than *JIT* code, you're doing something wrong. D will always beat JIT. Lua is 
 also a joke language, D is for high performance servers and operating 
 systems. In the worst case, disassemble the luajit program, steal its codes 
 and write it using inline assembler in D. D must win these performance 
 battles. It's technically superior.

But D didn't win. Not here. And what was I doing wrong? Please point out. I 
posted this because I was surprised myself and I thought that can't be.



Re: Why is D slower than LuaJIT?

2010-12-22 Thread Walter Bright

Andreas Mayer wrote:

Or is D, unlike I thought, not suitable for high performance computing? What
should I do?



I notice you are using doubles in D. dmd currently uses the x87 to evaluate 
doubles, and on some processors the x87 is slow relative to using the XMM 
instructions. Also, dmd's back end doesn't align the doubles on 16 byte 
boundaries, which can also slow down the floating point on some processors.


Both of these code gen issues with dmd are well known, and I'd like to solve 
them after we address higher priority issues.


If it's not clear, I'd like to emphasize that these are compiler issues, not D 
language issues.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andreas Mayer
Walter Bright Wrote:

 I notice you are using doubles in D. dmd currently uses the x87 to evaluate 
 doubles, and on some processors the x87 is slow relative to using the XMM 
 instructions. Also, dmd's back end doesn't align the doubles on 16 byte 
 boundaries, which can also slow down the floating point on some processors.

Using long instead of double, it is still slower than LuaJIT (223 ms on my 
machine).
Even with int it still takes 101 ms and is at least 3x slower than LuaJIT.

 Both of these code gen issues with dmd are well known, and I'd like to solve 
 them after we address higher priority issues.
 
 If it's not clear, I'd like to emphasize that these are compiler issues, not 
 D 
 language issues.

I shouldn't use D now? How long until it is ready?


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Walter Bright

Walter Bright wrote:

Andreas Mayer wrote:
Or is D, unlike I thought, not suitable for high performance 
computing? What

should I do?


I forgot to mention. In the D version, use integers as a loop counter, not 
doubles.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andreas Mayer
Iain Buclaw Wrote:

 Another may be simply that there is a lot
 more going on behind the scenes than what you give credit for in D.

What else does it do? I want to add it to the Lua version.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread bearophile
Andreas Mayer:

 To see what performance advantage D would give me over using a scripting 
 language, I made a small benchmark. It consists of this code:

I have done (and I am doing) many benchmarks with D, and I too have seen 
similar results. I have discussed this topic two times in past, this was one 
time:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=110419
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=110420

For Floating-point-heavy code Lua-JIT is often faster than D compiled with DMD. 
I have found that on the SciMark2 benchmark too Lua-JIT is faster than D code 
compiled with DMD. On the other hand if I use LDC I am often able to beat 
LuaJIT 2.0.0-beta3 (we are now at beta5) (if the D code doesn't ask for too 
much inlining).

The Lua-JIT is written by a very smart person, maybe a kind of genius that has 
recently given ideas to designers of V8 and Firefox JS Engine. The LuaJIT uses 
very well SSE registers and being a JIT it has more runtime information about 
the code, so it is able to optimize it better. It unrolls dynamically, inlines 
dynamic things, etc. DMD doesn't perform enough optimizations. Keep in mind 
that the main purpose of DMD is now to finish implementing D (and sometimes to 
find what to implement! Because there are some unfinished corners in D design). 
Performance tuning is mostly for later.

-

Walter Bright:

If it's not clear, I'd like to emphasize that these are compiler issues, not D 
language issues.

Surely Lua looks like a far worse language regarding optimization 
opportunities. But people around here (like you) must start to realize that JIT 
compilation is not what it used to be. Today the JIT compilation done by the 
JavaVM is able to perform de-virtualization, dynamic loop unrolling, inlining 
across compilation units, and some other optimizations that despite are not 
language issues are not done or not done enough by static compilers like LDC, 
GCC, DMD. The result is that SciMark2 benchmark is about as fast in Java and C, 
and for some sub-benchmarks it is faster :-)

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Walter Bright

Andreas Mayer wrote:

I shouldn't use D now? How long until it is ready?


It depends on what you want to do. A lot of people are successfully using D.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread bearophile
loser:

 So is it justified enough to throw my W's incompetence card on the table at 
 this point? How else it is possible that a simple scripting language with 
 simple JIT optimization heuristics can outperform a performance oriented 
 systems programming language.

It's not wise to prematurely improve the inlining a lot right now when there is 
no 64 bit version yet, and there are holes or missing parts in several corners 
of the language. Performance tuning has a lower priority.

Designing a good language and performance-tuning its implementation ask for 
different skills. The very good author of Lua-JIT is probably not good at 
designing a C++-class language :-) What's needed now is to smooth the rough 
corners of the D language, not to squeeze out every bit of performance.

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-22 Thread spir
On Wed, 22 Dec 2010 18:26:33 -0500
Gary Whatmore n...@spam.sp wrote:

 Ary Borenszweig Wrote:
 
  You said Lua is a joke language. It doesn't seem to be one...
 
 Okay then, maybe it's not completely true. I meant it doesn't work in large 
 scale applications unlike a static systems programming language. Need to 
 study how extensively it's used in that game. I just tihkn being popular 
 doesn't mean it's not a joke. Think about PHP !

I find your assertions rather pointless, if not meaningless. Lua is certainly a 
joke language in D's preferred applications domains, just like D is a joke 
language as an embedded, data-description, or user-scripting language.
And precisely: the fact that Lua (well helped by LuaJIT) can outperform D on 
its own terrain --even if on a single example-- should give you *even more 
_respect_* for this language and its designers. (And raise serious worries 
about D ;-)

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andrej Mitrovic
On 12/22/10, Steven Schveighoffer schvei...@yahoo.com wrote:
 Without any imperical testing, I would guess this has something to do with
 the lack of inlining for algorithmic functions.  This is due primarily to
 uses of enforce, which use lazy parameters, which are currently not
 inlinable (also, ensure you use -O -release -inline for the most optimized
 code).


I have just tried removing enforce usage from Phobos and recompiling
the library, and compiling again with -O -release -inline. It doesn't
appear to make a difference in the timing speed over multiple runs.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Eric Poggel

On 12/22/2010 5:31 PM, Gary Whatmore wrote:

5) you were using old d runtime garbage collector. One fellow here made a 
precise state of the art GC which beats even Java's 20 year old GC and C#. 
Patch your dmd to use this instead.


Could you point me to more information?  This sounds interesting.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Walter Bright

bearophile wrote:

Surely Lua looks like a far worse language regarding optimization
opportunities. But people around here (like you) must start to realize that
JIT compilation is not what it used to be. Today the JIT compilation done by
the JavaVM is able to perform de-virtualization, dynamic loop unrolling,


The Java JIT did that 15 years ago. I think you forget that I wrote on a Java 
compiler way back in the day (the companion JIT was done by Steve Russell, yep, 
the Optlink guy).



inlining across compilation units,


dmd does cross-module inlining.


and some other optimizations that
despite are not language issues are not done or not done enough by static
compilers like LDC, GCC, DMD. The result is that SciMark2 benchmark is about
as fast in Java and C, and for some sub-benchmarks it is faster :-)


Inherent Java slowdowns are not in numerical code. The Java language isn't 
inherently worse at numerics than C, C++, D, etc. Where Java is inherently worse 
is in its excessive reliance on dynamic allocation (and that is rare in numeric 
code - you don't new a double).




Re: Why is D slower than LuaJIT?

2010-12-22 Thread Walter Bright

Andrej Mitrovic wrote:

I have just tried removing enforce usage from Phobos and recompiling
the library, and compiling again with -O -release -inline. It doesn't
appear to make a difference in the timing speed over multiple runs.


Try looking at the obj2asm dump of the inner loop.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread bearophile
 I think you forget that I wrote on a Java compiler way back in the day

I remember it :-)


 dmd does cross-module inlining.

I didn't know this, much...

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andrei Alexandrescu

On 12/22/10 4:04 PM, Andreas Mayer wrote:

To see what performance advantage D would give me over using a scripting 
language, I made a small benchmark. It consists of this code:


auto L = iota(0.0, 1000.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);


It runs in 281 ms on my computer.


Thanks for posting the numbers. That's a long time, particularly 
considering that the two map instances don't do anything. So the bulk of 
the computation is:


auto L = iota(0.0, 1000.0);
auto V = reduce!a + b(L3);

There is one inherent problem that affects the speed of iota: in iota, 
the value at position i is computed as 0.0 + i * step, where step is 
computed from the limits. That's one addition and a multiplication for 
each pass through iota. Given that the actual workload of the loop is 
only one addition, we are doing a lot more work. I suspect that that's 
the main issue there.


The reason for which iota does that instead of the simpler increment is 
that iota must iterate the same values forward and backward. Using ++ 
may interact with floating-point vagaries, so the code is currently 
conservative.


Another issue is the implementation of reduce. Reduce is fairly general 
which may mean that it generates mediocre code for that particular case. 
We can always optimize the general case and perhaps specialize for 
select cases.


Once we figure where the problem is, there are numerous possibilities to 
improve the code:


1. Have iota check in the constructor whether the limits allow ++ to be 
precise. If so, use that. Of course, that means an extra runtime test...


3. Give up on iota being a random access or bidirectional range. If it's 
a forward range, we don't need to worry about going backwards.


4. Improve reduce as described above.


The same code in Lua (using LuaJIT) runs in 23 ms.

That's about 10 times faster. I would have expected D to be faster. Did I do 
something wrong?

The first Lua version uses a simplified design. I thought maybe that is unfair 
to ranges, which are more complicated. You could argue ranges have more 
features and do more work. To make it fair, I made a second Lua version of the 
above benchmark that emulates ranges. It is still 29 ms fast.

The full D version is here: http://pastebin.com/R5AGHyPx
The Lua version: http://pastebin.com/Sa7rp6uz
Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

Could someone help me solving this mystery?

Or is D, unlike I thought, not suitable for high performance computing? What 
should I do?


Thanks very much for taking the time to measure and post results, this 
is very helpful. As this test essentially measures the performance of 
iota and reduce, it would be hasty to generalize the assessment. 
Nevertheless, we need to look into improving this particular microbenchmark.


Please don't forget to print the result of the computation in both 
languages, as there's always the possibility of some oversight.



Andrei


Re: Why is D slower than LuaJIT?

2010-12-22 Thread bearophile
Andrei:

 As this test essentially measures the performance of 
 iota and reduce, it would be hasty to generalize the assessment. 

From other tests I have seen that often FP-heavy code is faster with Lua-JIT 
than with D-DMD. But on average the speed difference is much less than 10 
times, generally no more than 2 times.

One benchmark, Lua and D code (both OOP and C-style included, plus several 
manually optimized D versions):
http://tinyurl.com/yeo2g8j

Bye,
bearophile


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andrei Alexandrescu

On 12/22/10 4:04 PM, Andreas Mayer wrote:

To see what performance advantage D would give me over using a scripting 
language, I made a small benchmark. It consists of this code:


auto L = iota(0.0, 1000.0);
auto L2 = map!a / 2(L);
auto L3 = map!a + 2(L2);
auto V = reduce!a + b(L3);


It runs in 281 ms on my computer.

The same code in Lua (using LuaJIT) runs in 23 ms.

That's about 10 times faster. I would have expected D to be faster. Did I do 
something wrong?

The first Lua version uses a simplified design. I thought maybe that is unfair 
to ranges, which are more complicated. You could argue ranges have more 
features and do more work. To make it fair, I made a second Lua version of the 
above benchmark that emulates ranges. It is still 29 ms fast.

The full D version is here: http://pastebin.com/R5AGHyPx
The Lua version: http://pastebin.com/Sa7rp6uz
Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

Could someone help me solving this mystery?

Or is D, unlike I thought, not suitable for high performance computing? What 
should I do?


I reproduced the problem with a test program as shown below. On my 
machine the D iota runs in 108ms, whereas a baseline using a handwritten 
loop runs in 43 ms.


I then replaced iota's implementation with a simpler one that's a 
forward range. Then the performance became exactly the same as for the 
simple loop.


Andreas, any chance you could run this on your machine and compare it 
with Lua? (I don't have Lua installed.) Thanks!



Andrei

// D version, with std.algorithm
// ~ 281 ms, using dmd 2.051 (dmd -O -release -inline)

import std.algorithm;
import std.stdio;
import std.range;
import std.traits;

struct Iota2(N, S) if (isFloatingPoint!N  isNumeric!S) {
private N start, end, current;
private S step;
this(N start, N end, S step)
{
this.start = start;
this.end = end;
this.step = step;
current = start;
}
/// Range primitives
@property bool empty() const { return current = end; }
/// Ditto
@property N front() { return current; }
/// Ditto
alias front moveFront;
/// Ditto
void popFront()
{
assert(!empty);
current += step;
}
@property Iota2 save() { return this; }
}

auto iota2(B, E, S)(B begin, E end, S step)
if (is(typeof((E.init - B.init) + 1 * S.init)))
{
return Iota2!(CommonType!(Unqual!B, Unqual!E), S)(begin, end, step);
}

void main(string args[]) {
 double result;
 auto limit = 10_000_000.0;
 if (args.length  1) {
writeln(iota);
auto L = iota2(0.0, limit, 1.0);
result = reduce!a + b(L);
} else {
writeln(baseline);
result = 0.0;
for (double i = 0; i != limit; ++i) {
result += i;
}
}
writefln(%f, result);
}


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andrej Mitrovic
That's odd, I'm getting opposite results:

iota = 78ms
baseline = 187ms

Andreas' old code gives:
421ms

This is over multiple runs so I'm getting the average out of about 20 runs.

On 12/23/10, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:
 On 12/22/10 4:04 PM, Andreas Mayer wrote:
 To see what performance advantage D would give me over using a scripting
 language, I made a small benchmark. It consists of this code:

 auto L = iota(0.0, 1000.0);
 auto L2 = map!a / 2(L);
 auto L3 = map!a + 2(L2);
 auto V = reduce!a + b(L3);

 It runs in 281 ms on my computer.

 The same code in Lua (using LuaJIT) runs in 23 ms.

 That's about 10 times faster. I would have expected D to be faster. Did I
 do something wrong?

 The first Lua version uses a simplified design. I thought maybe that is
 unfair to ranges, which are more complicated. You could argue ranges have
 more features and do more work. To make it fair, I made a second Lua
 version of the above benchmark that emulates ranges. It is still 29 ms
 fast.

 The full D version is here: http://pastebin.com/R5AGHyPx
 The Lua version: http://pastebin.com/Sa7rp6uz
 Lua version that emulates ranges: http://pastebin.com/eAKMSWyr

 Could someone help me solving this mystery?

 Or is D, unlike I thought, not suitable for high performance computing?
 What should I do?

 I reproduced the problem with a test program as shown below. On my
 machine the D iota runs in 108ms, whereas a baseline using a handwritten
 loop runs in 43 ms.

 I then replaced iota's implementation with a simpler one that's a
 forward range. Then the performance became exactly the same as for the
 simple loop.

 Andreas, any chance you could run this on your machine and compare it
 with Lua? (I don't have Lua installed.) Thanks!


 Andrei

 // D version, with std.algorithm
 // ~ 281 ms, using dmd 2.051 (dmd -O -release -inline)

 import std.algorithm;
 import std.stdio;
 import std.range;
 import std.traits;

 struct Iota2(N, S) if (isFloatingPoint!N  isNumeric!S) {
  private N start, end, current;
  private S step;
  this(N start, N end, S step)
  {
  this.start = start;
  this.end = end;
  this.step = step;
  current = start;
  }
  /// Range primitives
  @property bool empty() const { return current = end; }
  /// Ditto
  @property N front() { return current; }
  /// Ditto
  alias front moveFront;
  /// Ditto
  void popFront()
  {
  assert(!empty);
  current += step;
  }
  @property Iota2 save() { return this; }
 }

 auto iota2(B, E, S)(B begin, E end, S step)
 if (is(typeof((E.init - B.init) + 1 * S.init)))
 {
  return Iota2!(CommonType!(Unqual!B, Unqual!E), S)(begin, end, step);
 }

 void main(string args[]) {
   double result;
   auto limit = 10_000_000.0;
   if (args.length  1) {
  writeln(iota);
  auto L = iota2(0.0, limit, 1.0);
  result = reduce!a + b(L);
  } else {
  writeln(baseline);
  result = 0.0;
  for (double i = 0; i != limit; ++i) {
  result += i;
  }
  }
  writefln(%f, result);
 }



Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andreas Mayer
Andrei Alexandrescu Wrote:

 Andreas, any chance you could run this on your machine and compare it 
 with Lua? (I don't have Lua installed.) Thanks!

Your version: 40 ms (iota and baseline give the same timings)
LuaJIT with map calls removed: 21 ms

Interesting results.


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Andrei Alexandrescu

On 12/22/10 11:06 PM, Andreas Mayer wrote:

Andrei Alexandrescu Wrote:


Andreas, any chance you could run this on your machine and compare it
with Lua? (I don't have Lua installed.) Thanks!


Your version: 40 ms (iota and baseline give the same timings)
LuaJIT with map calls removed: 21 ms

Interesting results.


Cool, thanks. I also tested against this C++ baseline:

#include stdio.h

int main() {
 const double limit = 1000.0;
 double result = 0.0;
 for (double i = 0; i != limit; ++i) {
   result += i;
 }
 printf(%f\n, result);
}

The baseline (compiled with -O3) runs in 21 ms on my machine, which 
means (if my and Andreas' machines are similar in performance) that Lua 
has essentially native performance for this loop and D has an issue in 
code generation that makes it 2x slower. I think this could be filed as 
a performance bug for dmd.


I'm thinking what to do about iota, which has good features but exacts 
too much cost on tight loop performance. One solution would be to define 
iota to be the simple, forward range that I defined as Iota2 in my 
previous post. Then, we need a different name for the full-fledged iota 
(random-access, has known length, iterates through the same numbers 
forward and backward etc). Ideas?



Andrei


Re: Why is D slower than LuaJIT?

2010-12-22 Thread Brad Roberts
On Wed, 22 Dec 2010, Andrei Alexandrescu wrote:

 On 12/22/10 11:06 PM, Andreas Mayer wrote:
  Andrei Alexandrescu Wrote:
  
   Andreas, any chance you could run this on your machine and compare it
   with Lua? (I don't have Lua installed.) Thanks!
  
  Your version: 40 ms (iota and baseline give the same timings)
  LuaJIT with map calls removed: 21 ms
  
  Interesting results.
 
 Cool, thanks. I also tested against this C++ baseline:
 
 #include stdio.h
 
 int main() {
  const double limit = 1000.0;
  double result = 0.0;
  for (double i = 0; i != limit; ++i) {
result += i;
  }
  printf(%f\n, result);
 }
 
 The baseline (compiled with -O3) runs in 21 ms on my machine, which means (if
 my and Andreas' machines are similar in performance) that Lua has essentially
 native performance for this loop and D has an issue in code generation that
 makes it 2x slower. I think this could be filed as a performance bug for dmd.
 
 I'm thinking what to do about iota, which has good features but exacts too
 much cost on tight loop performance. One solution would be to define iota to
 be the simple, forward range that I defined as Iota2 in my previous post.
 Then, we need a different name for the full-fledged iota (random-access, has
 known length, iterates through the same numbers forward and backward etc).
 Ideas?
 
 
 Andrei

Since the timing code isn't here, I'm assuming you guys are doing the 
testing around the whole app.  While that might be interesting, it's 
hiding an awfully large and important difference, application startup 
time.

C has very little, D quite a bit more, and I don't know what Lua looks 
like there.  If the goal is to test this math code, you'll need to 
separate the two.

At this point, I highly suspect you're really measuring the runtime costs.