On Saturday, 11 May 2019 at 00:32:54 UTC, H. S. Teoh wrote:

When it comes to performance, I've essentially given up looking at DMD output. DMD's inliner gives up far too easily, leading to a lot of calls that aren't inlined when they really should be, and DMD's optimizer does not have loop unrolling, which excludes a LOT of subsequent optimizations that could have been applied. I wouldn't base any performance decisions on DMD output. If LDC or GDC produces non-optimal code, then we have cause to do something. Otherwise, IMO we're just uglifying D code and making it unmaintainable for no good reason.

I think this thread is beginning losing sight of the larger picture. What I'm trying to achieve is the opt-in continuum that Andrei mentioned elsewhere on this forum. We can't do that with the way the compiler and runtime currently interact. So, the first task, which I'm trying to get around to, is to convert runtime hooks to templates. Using the compile-time type information will allow us to avoid `TypeInfo`, therefore classes, therefore the entire D runtime. We're now much closer to the opt-in continuum Andrei mentioned previously on this forum. Now let's assume that's done...

Those new templates will eventually call a very few functions from the C standard library, memcpy being one of them. Because the runtime hooks are now templates, we have type information that we can use in the call to memcpy. Therefore, I want to explore implementing `void memcpy(T)(ref T dst, const ref T src) @safe, nothrow, pure, @nogc` rather than `void* memcpy(void*, const void*, size_t)` There are some issues here such as template bloat and compile times, but I want to explore it anyway. I'm trying to imagine, what would memcpy in D look like if we didn't have a C implementation clouding narrowing our imagination. I don't know how that will turn out, but I want to explore it.

For LDC we can just do something like this...

void memcpy(T)(ref T dst, const ref T src) @safe, nothrow, @nogc, pure
{
version(LDC)
{
    // after casting dst and src to byte arrays...
    // (probably need to put the casts in a @trusted block)
    for(int i = 0; i < size; i++)
        dstArray[i] = srcArry[i];
}
}

LDC is able to see that as memcpy and do the right thing. Also if the LDC developers want to do their own thing altogether, more power to them. I don't see anything ugly about it.

However, DMD won't do the right thing. I guess others are thinking that we'd just re-implement `void* memcpy(void*, const void*, size_t)` in D and we'd throw in a runtime call to `memcpy(&dstArray[0], &srcArray[0], T.sizeof())`. That's ridiculous. What I want to do is use the type information to generate an optimal implementation (considering size and alignment) that DMD will be forced to inline with `pragma(inline)` That implementation can also take into consideration target features such as SIMD. I don't believe the code will be complex, and I expect it to perform at least as well as the C implementation. My initial tests show that it will actually outperform the C implementation, but that could be a problem with my tests. I'm still researching it.

Now assuming that's done, we now have language runtime implementations that are isolated from heavier runtime features (like the `TypeInfo` classes) that can easily be used in -betterC builds, bare-metal systems programming, etc. simply by importing them as a header-only library; it doesn't require first compiling (or cross-compiling) a runtime for linking with your program; you just import and go. We're now much closer to the opt-in continuum.

Now what about development of druntime itself. Well wouldn't it be nice if we could utilize things like `std.traits`, `std.meta`, `std.conv`, and a bunch of other stuff from Phobos? Wouldn't it also be nice if we could use that stuff in DMD itself without importing Phobos? So let's take that stuff in Phobos that doesn't need druntime and put them in a library that doesn't require druntime (i.e. utiliD). Now druntime can import utiliD and have more idiomatic-D implementations.

But the benefits don't stop there, bare-metal developers, microcontroller developers, kernel driver developers, OS developers, etc... can all use the runtime-less library to bootstap their own implementations without having to re-invent or copy code out of Phobos and druntime.

I'm probably not articulating this vision well. I'm sorry. Maybe we'll just have to hope I can find the time and energy to do it myself and then others will finally see from the results. Or maybe I'll go have a nice helping of crow.

Mike


Reply via email to