On 3/30/16 3:19 PM, Liran Zvibel wrote:
On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
Compression in the usual sense won't help. Sure, it might reduce the
object file size, but the full string will again have to be generated
first, still requiring absurd amounts time and space. The latter is
definitely not negligible for symbol names of several hundreds of
kilobytes; it shows up prominently in compiler profiles of affected
Weka builds.

We love Voldemort types at Weka, and use them a lot in our
non-gc-allocating ranges and algorithm libraries. Also, we liberally
nest templates inside of other templates.
I don't think we can do many of the things we do if we had to define
everything at module level. This flexibility is amazing for us and part
of the reason we love D.

Voldemort types are what cause the bloat, templates inside templates aren't as much of a problem. It's because the Voldemort type has to include in its symbol name at least twice, and I think 3 times actually (given the return type), the template parameter/function parameter types of the function it resides in. If the template is just a template, it's just included once. This is why moving the type outside the function is effective at mitigation. It's linear growth vs. exponential.

I too like Voldemort types, but I actually found moving the types outside the functions quite straightforward. It's just annoying to have to repeat the template parameters. If you make them private, then you can simply avoid all the constraints. It's a bad leak of implementation, since now anything in the file has access to that type directly, but it's better than the issues with voldemort types.

See the update to my iopipe library here: https://github.com/schveiguy/iopipe/commit/1b0696dc82fce500c6b314ec3d8e5e11e0c1bcd7

This one commit made my example program 'convert' (https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d) save over 90% binary size (went from 10MB to <1MB).

This also calmed down some REALLY horrible stack traces when I was debugging. As in, I could actually understand what function it was talking about, and it didn't take 10 seconds to print stack trace.


But, as David said -- it comes with a great price for us.

I just processed our biggest executable, and came up with the following
numbers:
total symbols: 99649
Symbols longer than 1k: 9639
Symbols longer than 500k: 102
Symbols longer than 1M: 62. The longest symbols are about 5M bytes!

This affects our exe sizes in a terrible way, and also increases our
compile and link times considerably. I will only be able to come up with
statistics of how much time was wasted due to too-long-symbols after we
fix it, but obviously this is a major problem for us.

From my testing, it doesn't take much to get to the point where the linker is unusable. A simple struct when nested in 15 calls to a function makes the linker take an unreasonable amount of time (over 1.5 minutes, I didn't wait to see how long). See my bug report for details.

Another factor in the name length is the module name which is included in every type and function. So you have a factor like 3^15 for the name, but then you multiply this by the module names as well.

I think we should try the solution proposed by Anon, as it has a good
possibility of saving quite a bit.
It's important to make sure that when a template is given as a template
parameter, the complete template is treated as the LName.

I hope this is given serious thought, looks like someone has already started implementation.

Anon, it appears that your mechanism has been well received by a few knowledgeable people here. I encourage you to solidify your proposal in a DIP (D improvement proposal) here: http://wiki.dlang.org/DIPs.

-Steve

Reply via email to