On 08.04.2012 21:24, H. S. Teoh wrote:

Yeah, that's what I was thinking of. This would be a very big gain for
the new AA implementation, for example. I wouldn't have to worry so much
about template bloat if most of the instantiations are going to get
merged anyway. :-)


Right the advantage is that one doesn't have to use tricks. One can
simply assume the compiler is doing it's job. That's what happened with inlining some 10+ years ago.


[...]
Not really... string pooling can take advantage of overlapping
(sub)strings, but I don't think you can do that with code. But I
think your idea has a lot of merit. I'm for making linkers smarter
than they currently are.


Sorry, it's just me running ahead of train somewhat. Basically once
this initial version is in place I have one cool refinement for it in
mind. For now we just need to keep the hash function transitive and
associative for the gods sake. 128/64bit checksum please ;)
[...]

And what would the cool refinement be? :-)


OK, you sort of forced my hand.
I hope you have been warned about spoilers before ;)


The refinement is merging prefixes and suffixes of course.
And for that one needs to calculate hashes for all of prefixes and all of suffixes. I will define _all_ later on.

First observation is that if you calculated partial checksums for prefixes you have sums for all suffixes and vise-versa.
Namely:
//prefix ends on i, suffix start on i
sum_prefix[i] = total - sum_suffix[i];

that is derived from the fact that:
total = sum_prefix[i] + sum_suffix[i];
(taking into account the properties of +/- in the finite field)

Now take the original idea, substitute checksum with an array of partial sums and lengths (btw lengths follow the same rule of prefix<->suffix) and you know what this refinement all about.

In fact this all was easy, the trick is fitting this into the time frame of compilation.

To that end one should do full duplicate elimination first then mess with merging.

Then we see that generally we are about to do O((M1*...*Mn)^2) work where n - is number of functions, Mi - number of prefixes (= suffixes) we defined for each. The constant C in front of (M1...Mn)^2 here is not that big - it's comparing checksums & lengths but *sometimes* memcmp still keep in mind that the number of all functions is big.

So we have to get around this monstrous workload. At the moment I see the only way is doing this is use coarse grained prefixes and introduce some threshold on maximum number of prefixes we account for. That about defining above mentioned _all_ for prefixes.

Coarse grained means we do store partial checksums on a fixed block boundary (say every 16 or 32 bytes) if that aligns properly with instructions if not we just skip this prefix and move on.
(this also hopefully limits memory usage on partial sums array)

Another threshold is that we don't mess with partial sums for really BIG functions. (might be not needed)

I think there is some space for other heuristics to try out but they are mostly along the same lines.

P.S. Damn, I could have done a nice paper on that... too late :)

--
Dmitry Olshansky

Reply via email to