On Friday, 30 December 2011 at 06:53:06 UTC, Walter Bright wrote:
I think this criticism is off target, because the C example was almost entirely macros - and macros that were used in the service of evading C language limitations. The point wasn't to use clever D features, the challenge was to demonstrate you can get the same results in D as in C.
...
I also think this is off target, because a C compiler really doesn't guarantee **** about efficiency, it only guarantees that it will work "as if" it was executed on some idealized abstract machine. Even dividing code up into functions is completely arbitrary, and open to wildly different strategies that are perfectly legal to any C compiler. A C compiler doesn't have to enregister anything in variables, either, and that has far more of a performance impact than inlining.
Even though the core language (of C and D) are not specific to any one platform, writing fast code has never been about targeting abstract idealized virtual machines. Some assumptions need to be made. Most assumptions that the C memcpy code makes can be expected to generally be true across major C compilers (e.g. macros are at least as fast as regular functions). However, your D port makes some rather fragile assumptions regarding the compiler implementation.
Let's eliminate the language distinction, and consider two memcpy versions - one using macros, the other using functions (not even with "inline"). Would you say that the second is generally as fast as the first? I'm being intentionally vague: saying that their performance is "about the same" is holding on MUCH more fragile assumptions.
The fact that major compiler vendors implement language extensions to facilitate writing optimized code shows that there is a demand for it. Even compilers that are great at optimization (GCC, LLVM) have such intrinsics.
I'm not necessarily advocating changing the core language (e.g. new @attributes, things that would need to go into TDPLv2). However, what I think would greatly improve the situation is to have DigitalMars provide recommendations for implementation-specific extensions that provide more control with regards to how the code is compiled (pragma names, keywords starting with __, etc.). Once they're defined, pull requests to add them to DMD will follow.
Functions below a certain size should be inlined if possible. Those above that size do not benefit perceptibly from inlining. Where that certain size exactly is, who knows, but I doubt that functions near that size will benefit much from user intervention.
I agree, but this wasn't as much about heuristics, but compiler capabilities (e.g. inlining assembler functions).