https://issues.dlang.org/show_bug.cgi?id=14571
--- Comment #15 from Vladimir Panteleev <thecybersha...@gmail.com> --- (In reply to Walter Bright from comment #14) > Take a look at the code generated for global data. In 64 bit code, it's all > relative to the program counter. In 32 bit code, it's indirect because of > shared library support (PIC). Not on Win32, though. > That died a couple decades ago with the advent of DLLs. DLLs are relocated at load time (and usually are linked with a base unlikely to conflict, so relocations are often not done). The hypothetical ptr[5] would be relocated as well. > x86-64 bit code > doesn't even have a direct addressing mode. Even the presumably direct > addressing modes in x32 are indirect because of the segment registers, and > despite that, the CPU does such a good job of address pipelining you'll > never see the effect of using a register offset. I would need to run some benchmarks to test this. But a quick test shows that 64-bit code has dedicated CPU instructions for relative addressing of globals, but indexing arrays on the heap still requires two instructions (mov rax, arr + mov dword ptr [arr+idx*4], value). > I know, that's why I mentioned it. BSS is special. I understood your post as that executable bloat still applies even though it goes into BSS. --