Re: A modest proposal: eliminate template code bloat

Dmitry Olshansky Sun, 08 Apr 2012 11:54:44 -0700

On 08.04.2012 21:24, H. S. Teoh wrote:


Yeah, that's what I was thinking of. This would be a very big gain for
the new AA implementation, for example. I wouldn't have to worry so much
about template bloat if most of the instantiations are going to get
merged anyway. :-)


Right the advantage is that one doesn't have to use tricks. One can

simply assume the compiler is doing it's job. That's what happened withinlining some 10+ years ago.


[...]

Not really... string pooling can take advantage of overlapping
(sub)strings, but I don't think you can do that with code. But I
think your idea has a lot of merit. I'm for making linkers smarter
than they currently are.


Sorry, it's just me running ahead of train somewhat. Basically once
this initial version is in place I have one cool refinement for it in
mind. For now we just need to keep the hash function transitive and
associative for the gods sake. 128/64bit checksum please ;)

[...]

And what would the cool refinement be? :-)

OK, you sort of forced my hand.
I hope you have been warned about spoilers before ;)


The refinement is merging prefixes and suffixes of course.

And for that one needs to calculate hashes for all of prefixes and allof suffixes. I will define _all_ later on.

First observation is that if you calculated partial checksums forprefixes you have sums for all suffixes and vise-versa.

Namely:
//prefix ends on i, suffix start on i
sum_prefix[i] = total - sum_suffix[i];

that is derived from the fact that:
total = sum_prefix[i] + sum_suffix[i];
(taking into account the properties of +/- in the finite field)

Now take the original idea, substitute checksum with an array of partialsums and lengths (btw lengths follow the same rule of prefix<->suffix)and you know what this refinement all about.

In fact this all was easy, the trick is fitting this into the time frameof compilation.

To that end one should do full duplicate elimination first then messwith merging.

Then we see that generally we are about to do O((M1*...*Mn)^2) workwhere n - is number of functions, Mi - number of prefixes (= suffixes)we defined for each. The constant C in front of (M1...Mn)^2 here is notthat big - it's comparing checksums & lengths but *sometimes* memcmpstill keep in mind that the number of all functions is big.

So we have to get around this monstrous workload. At the moment I seethe only way is doing this is use coarse grained prefixes and introducesome threshold on maximum number of prefixes we account for. That aboutdefining above mentioned _all_ for prefixes.

Coarse grained means we do store partial checksums on a fixed blockboundary (say every 16 or 32 bytes) if that aligns properly withinstructions if not we just skip this prefix and move on.

(this also hopefully limits memory usage on partial sums array)

Another threshold is that we don't mess with partial sums for really BIGfunctions. (might be not needed)

I think there is some space for other heuristics to try out but they aremostly along the same lines.


P.S. Damn, I could have done a nice paper on that... too late :)

--
Dmitry Olshansky

Re: A modest proposal: eliminate template code bloat

Reply via email to