tl;dr: allocation is a critical speed issue with dmd. Using the bump-pointer method is very fast, and it matters.
What about packing DMD structure members such as integers and enums more efficiently?
We could start with making enums __attribute__((packed)). Is there any free static/dynamic tool to check for unexercized bits?
How does Clang do to save so much space compared to GCC? Do they pack gentlier or use deallocation?
A much higher-hanging fruit is to switch from using pointers to 32-bit handles on 64-bit CPUs to reference tokens, sub-expressions etc. But I guess that is a big undertaking getting type-safe and may give performance hits.