> On Jul 27, 2016, at 7:21 PM, John McCall <[email protected]> wrote: >> On Jul 21, 2016, at 6:42 PM, Peter Collingbourne <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi all, >> >> The ABI currently requires that virtual tables for a class appear >> consecutively in a virtual table group. I would like to propose a >> restriction that would require that compilers may only access the virtual >> table associated with the address point stored in an object's virtual table >> pointer, and may not rely on any knowledge that the compiler may have about >> the relative layout of other virtual tables in the virtual table group. >> >> The purpose of this restriction is to allow an implementation to split a >> virtual table group along virtual table boundaries. >> >> Motivation >> >> There are at least two scenarios which would benefit from vtable splitting: >> clients which want to place data either before or after the ABI-required >> part of a virtual table, and clients which want to control the layout of >> virtual tables for performance or security reasons. >> >> As an example of the first scenario, when performing whole-program virtual >> call optimization, Clang will apply an optimization known as virtual >> constant propagation [0], which causes data to be laid out at a specific >> offset from the address point of each virtual table in a hierarchy. If that >> virtual table appears in a virtual table group, padding is required to place >> the data at an appropriate offset for each class. Because of the current >> restriction that vtables must appear consecutively, the optimizer may need >> to add more padding than necessary, or inhibit the optimization entirely if >> it would require too much padding. >> >> As an example of the second scenario, an implementation may wish to lay out >> virtual tables hierarchically either in order to increase the likelihood of >> a cache hit when repeatedly making the same virtual call over a set of >> heterogeneous objects, or to efficiently implement a security mitigation >> (specifically control flow integrity [1]) based on checking virtual table >> addresses for set membership. Placing only virtual tables (rather than >> virtual table groups) consecutively would likely increase the cache hit >> likelihood further and reduces the amount of metadata required to implement >> set membership checks. >> >> In an experiment involving the Chromium web browser, I have measured a >> binary size decrease of 1.5%, and a median performance improvement of about >> 1% on Chromium's layout benchmarks when comparing a binary compiled with >> control flow integrity and whole-program virtual call optimization against a >> binary compiled with control flow integrity, whole-program virtual call >> optimization and a prototype implementation of vtable splitting. >> >> Commentary >> >> Although the ABI specifies [2] the calling convention for virtual calls, >> which requires the call to be made using the this-adjustment appropriate for >> the object from which the virtual table pointer was loaded, the as-if rule >> could in principle allow a program to make a call using a different virtual >> table if the virtual table group contains multiple secondary virtual tables, >> as the distance between these virtual tables would be fixed (the same would >> be possible for all virtual tables if the dynamic type were known, but in >> that case the program could just call the appropriate virtual function >> directly). > > In what situation would the distance between secondary virtual tables in a > VTT be fixed where you don't know the dynamic type? Derived classes can > always introduce or re-introduce virtual bases in ways that re-order the > secondary virtual tables.
Okay, thinking about it more, the idea is that, because the enumeration order is depth-first, there will always be a local range of the compound v-table that contains the v-tables of the non-virtual for any given portion of the class hierarchy. Because the secondary tables never have new function pointers added to them, they do not grow to the right; and because v-call offsets are always added to the primary v-table for a virtual base, they do not grow to the left. Therefore, a secondary v-table of a non-virtual base is fixed in size, and so you could theoretically reach from one secondary v-table to another with a constant offset. For this to be profitable, of course, you would have to have one secondary table already loaded when you tried to use the other; but that could happen. So I agree that this would be a possible optimization today. >> The purported benefit would be to avoid an additional virtual pointer load >> from the object in cases where consecutive calls are made to virtual >> functions introduced in different bases. However, it seems to me that cases >> where this is beneficial would be rare: not only would you need at least >> three bases and a derived class which does not override any of the called >> virtual functions, but when performing two consecutive calls it seems likely >> that the vtable would need to be reloaded anyway, either from the object or >> from the stack, especially with majority caller-save ABIs such as x86-64, or >> in any event because the first virtual call may have changed the object's >> dynamic type. This part of your argument is weak. Putting the v-table in a callee-save register would be quite reasonable if you're doing many repeat calls. I don't see why it would matter whether the majority of registers are callee-save as long as the absolute number is at least 2; even i386 gives us 3 general-purpose callee-save registers, and x86-64 has 5. And it's undefined behavior to change a pointer's dynamic type like that, although that can be tricky to take advantage of. That said, I would say that the trade-offs still break in your favor here. The optimization potential of this sort of contrived situation — calls to virtual methods of two different secondary v-tables — doesn't out-weigh the optimization potential of permitting non-standard organization of secondary v-tables. >> It seems (according to experiments [3] carried out at godbolt.org >> <http://godbolt.org/>) that all major compilers (gcc, clang, icc) do already >> use the appropriate vtable group and therefore are compliant with the >> proposed restriction. >> >> (There would also seem to be nothing preventing an implementation from >> choosing to load the RTTI pointer or offset-to-top from another virtual >> table group. However I would consider this even less likely to be beneficial >> than a virtual call via another virtual table.) I agree, I cannot imagine why an optimizer would deliberately do this when it could get the same information from a simpler source. >> The ABI specifies that the vtables in a group shall be laid out >> consecutively when referenced via a vtable group symbol, and I'm not >> proposing to change this. The effect of this proposal would be to allow a >> vtable to be split if the vtable group symbol is not referenced directly by >> name outside of the translation unit(s) participating in the optimization. >> This may be the case when a class has internal linkage, or if the program is >> linked with LTO, which allows the compiler to know which symbols are >> referenced outside of the LTO'd part of the program. >> >> Wording >> >> I propose to add two paragraphs to the section of the ABI describing virtual >> table groups, as follows: >> >> diff --git a/abi.html b/abi.html >> index 79cda2c..fce0c60 100644 >> --- a/abi.html >> +++ b/abi.html >> @@ -1193,6 +1193,18 @@ and again excluding primary bases >> (which share virtual tables with the classes for which they are primary). >> </ul> >> >> +<p> >> +When performing a virtual call or loading any other data from an address >> +derived from the address point stored in an object's virtual table pointer, >> +a program may only load from the virtual table associated with that address >> +point, and not from any other virtual table in the same virtual table group >> +which might be presumed to be located at a fixed offset from the address >> +point as a result of the above layout algorithm. >> + >> +<p> >> +The purpose of this restriction is to allow an implementation to split a >> +virtual table group along virtual table boundaries if its symbol is not >> +visible to other translation units. I would say this more generally: the ABI does not make guarantees about the relative layout of v-tables in an object or a VTT. It guarantees only the layout of the global symbol. It does not guarantee that the v-table pointers actually installed in an object or a VTT will point into that global symbol. John. >> >> <p> >> <a name="vtable-construction"> >> >> >> Thanks, >> Peter >> >> [0] http://lists.llvm.org/pipermail/llvm-dev/2016-January/094600.html >> <http://lists.llvm.org/pipermail/llvm-dev/2016-January/094600.html> >> [1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html >> <http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html> >> [2] https://mentorembedded.github.io/cxx-abi/abi.html#vcall.caller >> <https://mentorembedded.github.io/cxx-abi/abi.html#vcall.caller> >> [3] https://godbolt.org/g/wX7Ay6 <https://godbolt.org/g/wX7Ay6> is a >> three-bases test case by Richard Smith, https://godbolt.org/g/7eG8A1 >> <https://godbolt.org/g/7eG8A1> is a dynamic-type-known test case by me >> _______________________________________________ >> cxx-abi-dev mailing list >> [email protected] <mailto:[email protected]> >> http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev > > _______________________________________________ > cxx-abi-dev mailing list > [email protected] > http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev
_______________________________________________ cxx-abi-dev mailing list [email protected] http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev
