On Sun, 11 Apr 2010 12:50:11 -0400, Joseph Wakeling <[email protected]> wrote:

I was very happy to see that D _does_ have a 'reserve' function for arrays,
which I had been missing compared to C++ (it's not mentioned in the array
docs).

It's new. It probably should be mentioned in the spec, but it's a druntime thing.


Still, I don't think that pre-reserving the memory per se is the influencing
factor on the differences in performance.

No, and like bearophile said, the D array is a compromise between performance and flexibility. There are amazing ways to use D arrays that you could never do with C++ vectors.

Note that in C++ the memory is not preassigned either.  The difference
between the performance of these pieces of code is striking -- on my
machine the D example takes about 70--75 seconds to run, whereas the
C++ example speeds through it in 10s or less.

The C++ example is reallocating memory, freeing memory it is no longer using. It also manually handles the memory management, allocating larger and larger arrays in some algorithmically determined fashion (for example, multiplying the length by some constant factor). This gives it an edge in performance because it does not have to do any costly lookup to determine if it can append in place, plus the realloc of the memory probably is cheaper than the GC realloc of D.

If you want to compare apples to apples (well, probably more like red apples to green apples), you need to do these things in a struct for D. I had thought the D appender class would do the trick, but as you stated below, it's even slower. This needs to be remedied.

D also uses about 20% more memory than the C++ even though the C++ code
declares a higher capacity for the vector (above 8 million) than D does
for the array (a little over 5 million).

D does not assume you stopped caring about the memory being pointed to when it had to realloc. Therefore, it leaves it around in case you are still using it. You can also expect more overhead in the GC because it tends to hang on to memory in case it wants to use it again, or because it hasn't collected it yet. This is true of most GC-based languages.

For example, you can do this in D:

int[] arr = new int[50];
int *arrelem = &arr[5];

for(int i = 0; i < 10000; i++)
  arr ~= i;

*arrelem = 12345; // valid.

You can't do the same thing with C++ vectors, when they reallocate, the memory they used to own could be freed. This invalidates all pointers and iterators into the vector, but the language doesn't prevent you from having such dangling pointers. Using one of them can result in memory corruption, one of the worst bugs. D tries to eliminate as much memory corruption problems as possible.


I don't know if it was naive to assume that D's dynamic arrays would be
equivalent to vectors in C++.  But it's clear that the D array appender
~= is much slower than the C++ vector push_back().

But is much safer, and supports safe slicing.  Compromises.

Using an Appender class and put() (from std.array) is even slower,
despite the std.array docs recommending this over ~. :-(

This must be fixed, the appender should be blazingly fast at appending (almost as fast as C++), with the drawback that the overhead is higher.

It's disappointing because I'm loving so much of D's syntax, but I can
write far more efficient code (with less explicit memory management)
in C++ ...

You haven't done much with it yet. When you start discovering how much D takes care of, you will be amazed :)

array appending isn't the fastest operation, but it is safe, and safety can be worth infinitely more than speed sometimes.

The thing about D is it *can* be fast and unsafe, just as fast and unsafe as C, but that's not the default.

-Steve

Reply via email to