cglee:

> Now, I have questions about how much 'D' is fast comparing with 'java'
> and how much bigger in binary size generated by 'D' comparing with
> 'C'.  Is there anyone has information about this?.

Performance comes from many different kinds of optimizations. In modern 
languages things as:
- Partial unrolling of loops even when the number of cycles is not known at 
compile-time
- Inlining, even in presence of virtual calls
- If the code performs lot of heap activity (as Java-style code) then a 
significant percentage of the performance comes from the efficiency of the GC.

But today the GPU is used more and more by serious number crunchers. A 
numerical program written in Python that implements its numerical kernels on 
good GPUs using PyCuda, PyopenCL, may be tens of times more efficient than 
average C code written for the CPU (and for numerical kernels on the CPU there 
is corepy). Intel says that the GPU is on average 2.5 only times faster, but in 
practice you need to be a Wizard to squeeze that amount of performance from 
very costly CPUs.

If you compile D1 code with LDC you get a good performance, comparable with C 
or sometimes better. If you use DMD you get lower performance (usually no more 
than 2-3 times slower) on both FP and integer code.

The D GC is not efficient enough yet, so efficient D1/D2 programs need to 
minimize heap activity, and use stack-allocates structs as much as possible, 
and avoid Java-style code.

Currently D programs use less memory than Java, thanks to structs too, but as D 
gets a better GC, it will probably use more RAM too. With GCs there is a 
tradeoff between speed and amount of memory usage.

D binary size is larger than C but comparable to C++ code that uses many 
templates, there is a constant overhead given by statically linked runtime 
that's bigger than the C++ one (GC, array concat, etc). The more templates you 
use, the bigger the binary becomes. Phobos uses templates heavily, Tango for D1 
uses them less often.

Bye,
bearophile

Reply via email to