On Thursday, 26 February 2015 at 07:05:56 UTC, Russel Winder
wrote:
pure opinion and handwaving, not to mention mud-slinging. There
should
be a rule saying that no-one, but no-one, is allowed to make
any claims
about anything to do with performance without first having
actually done
a proper experiment and presented actual real data with
statistical
analysis.
I agree in general, but one can argue the theoretical best
performance based on computer architecture and language features.
The fact is to get good performance you need cache-line friendly
layout. D is stuck with:
1. Fixed C struct layout
2. Separate compilation units that leaves the compiler blind.
3. C backends that are less GC friendly than Java/Javascript.
4. No compiler control over multi-threading.
5. Generic programming without compiler optimized data layout
(that hurts).
It is possible to do "atomic writes" cheaply on x86 if you stick
everything on the same cache line, and schedule instructions
around the SFENCE in a clever manner to prevent pipeline stalls.
It is possible to avoid pointers and use indexes thus limiting
the extent of a precise scan.
So surely you can create an experiment that gets good performance
close to the theoretical limit, but it does not tell you how it
will work out with a complicated generic programming based
program based on D semantics and "monkey programming".
Computer architecture is also moving. AFAIK, on Intel MIC you get
fast RAM close to the core (multi layered on top) and slower
shared RAM. There is also a big difference on memory bus
throughput ranging from ~5 - 30 GB/s peak on desktop CPUs.
But before you measure anything you need to agree on what you
want measured. You need a baseline. IMO, the only acceptable
baseline is carefully hand crafted data layout and manual memory
management...