On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
Compression in the usual sense won't help. Sure, it might reduce the object file size, but the full string will again have to be generated first, still requiring absurd amounts time and space. The latter is definitely not negligible for symbol names of several hundreds of kilobytes; it shows up prominently in compiler profiles of affected Weka builds.

We love Voldemort types at Weka, and use them a lot in our non-gc-allocating ranges and algorithm libraries. Also, we liberally nest templates inside of other templates. I don't think we can do many of the things we do if we had to define everything at module level. This flexibility is amazing for us and part of the reason we love D.

But, as David said -- it comes with a great price for us.

I just processed our biggest executable, and came up with the following numbers:
total symbols: 99649
Symbols longer than 1k: 9639
Symbols longer than 500k: 102
Symbols longer than 1M: 62. The longest symbols are about 5M bytes!

This affects our exe sizes in a terrible way, and also increases our compile and link times considerably. I will only be able to come up with statistics of how much time was wasted due to too-long-symbols after we fix it, but obviously this is a major problem for us.

I think we should try the solution proposed by Anon, as it has a good possibility of saving quite a bit. It's important to make sure that when a template is given as a template parameter, the complete template is treated as the LName.

Thinking about the compression idea by Andrei, I think we get such long names since we have huge symbols that are being passed as Voldemort names to template parameters. Then we repeat the huge symbols several times in the new template. Think of a .5M symbol passed few times to a template, this is probably how we get to 5M size symbols. This could end up being too complex, but if we assign "huffman coding" like names to the complete template names in a module scope (lets say, only if the template name is longer than 30 bytes), we then will be able to replace a very long string by the huffman coded version coupled with the LName+Number idea above, we will be able to shorten symbol names considerably.

An initial implementation could start with just the LName# solution, and then we can see if we also have to recursively couple it with huffman-coding of the results template names.

Liran

Reply via email to