On Thursday, 26 February 2015 at 07:05:56 UTC, Russel Winder wrote:
pure opinion and handwaving, not to mention mud-slinging. There should be a rule saying that no-one, but no-one, is allowed to make any claims about anything to do with performance without first having actually done a proper experiment and presented actual real data with statistical
analysis.

I agree in general, but one can argue the theoretical best performance based on computer architecture and language features. The fact is to get good performance you need cache-line friendly layout. D is stuck with:

1. Fixed C struct layout

2. Separate compilation units that leaves the compiler blind.

3. C backends that are less GC friendly than Java/Javascript.

4. No compiler control over multi-threading.

5. Generic programming without compiler optimized data layout (that hurts).

It is possible to do "atomic writes" cheaply on x86 if you stick everything on the same cache line, and schedule instructions around the SFENCE in a clever manner to prevent pipeline stalls.

It is possible to avoid pointers and use indexes thus limiting the extent of a precise scan.

So surely you can create an experiment that gets good performance close to the theoretical limit, but it does not tell you how it will work out with a complicated generic programming based program based on D semantics and "monkey programming".

Computer architecture is also moving. AFAIK, on Intel MIC you get fast RAM close to the core (multi layered on top) and slower shared RAM. There is also a big difference on memory bus throughput ranging from ~5 - 30 GB/s peak on desktop CPUs.

But before you measure anything you need to agree on what you want measured. You need a baseline. IMO, the only acceptable baseline is carefully hand crafted data layout and manual memory management...

Reply via email to