Re: [cxx-abi-dev] Proposing an ABI restriction on loads from an object's vtable pointer

John McCall Thu, 28 Jul 2016 09:54:01 -0700

> On Jul 27, 2016, at 7:21 PM, John McCall <[email protected]> wrote:
>> On Jul 21, 2016, at 6:42 PM, Peter Collingbourne <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi all,
>> 
>> The ABI currently requires that virtual tables for a class appear 
>> consecutively in a virtual table group. I would like to propose a 
>> restriction that would require that compilers may only access the virtual 
>> table associated with the address point stored in an object's virtual table 
>> pointer, and may not rely on any knowledge that the compiler may have about 
>> the relative layout of other virtual tables in the virtual table group.
>> 
>> The purpose of this restriction is to allow an implementation to split a 
>> virtual table group along virtual table boundaries.
>> 
>> Motivation
>> 
>> There are at least two scenarios which would benefit from vtable splitting: 
>> clients which want to place data either before or after the ABI-required 
>> part of a virtual table, and clients which want to control the layout of 
>> virtual tables for performance or security reasons.
>> 
>> As an example of the first scenario, when performing whole-program virtual 
>> call optimization, Clang will apply an optimization known as virtual 
>> constant propagation [0], which causes data to be laid out at a specific 
>> offset from the address point of each virtual table in a hierarchy. If that 
>> virtual table appears in a virtual table group, padding is required to place 
>> the data at an appropriate offset for each class. Because of the current 
>> restriction that vtables must appear consecutively, the optimizer may need 
>> to add more padding than necessary, or inhibit the optimization entirely if 
>> it would require too much padding.
>> 
>> As an example of the second scenario, an implementation may wish to lay out 
>> virtual tables hierarchically either in order to increase the likelihood of 
>> a cache hit when repeatedly making the same virtual call over a set of 
>> heterogeneous objects, or to efficiently implement a security mitigation 
>> (specifically control flow integrity [1]) based on checking virtual table 
>> addresses for set membership. Placing only virtual tables (rather than 
>> virtual table groups) consecutively would likely increase the cache hit 
>> likelihood further and reduces the amount of metadata required to implement 
>> set membership checks.
>> 
>> In an experiment involving the Chromium web browser, I have measured a 
>> binary size decrease of 1.5%, and a median performance improvement of about 
>> 1% on Chromium's layout benchmarks when comparing a binary compiled with 
>> control flow integrity and whole-program virtual call optimization against a 
>> binary compiled with control flow integrity, whole-program virtual call 
>> optimization and a prototype implementation of vtable splitting.
>> 
>> Commentary
>> 
>> Although the ABI specifies [2] the calling convention for virtual calls, 
>> which requires the call to be made using the this-adjustment appropriate for 
>> the object from which the virtual table pointer was loaded, the as-if rule 
>> could in principle allow a program to make a call using a different virtual 
>> table if the virtual table group contains multiple secondary virtual tables, 
>> as the distance between these virtual tables would be fixed (the same would 
>> be possible for all virtual tables if the dynamic type were known, but in 
>> that case the program could just call the appropriate virtual function 
>> directly).
> 
> In what situation would the distance between secondary virtual tables in a 
> VTT be fixed where you don't know the dynamic type?  Derived classes can 
> always introduce or re-introduce virtual bases in ways that re-order the 
> secondary virtual tables.


Okay, thinking about it more, the idea is that, because the enumeration order 
is depth-first, there will always be a local range of the compound v-table that 
contains the v-tables of the non-virtual for any given portion of the class 
hierarchy.  Because the secondary tables never have new function pointers added 
to them, they do not grow to the right; and because v-call offsets are always 
added to the primary v-table for a virtual base, they do not grow to the left.  
Therefore, a secondary v-table of a non-virtual base is fixed in size, and so 
you could theoretically reach from one secondary v-table to another with a 
constant offset.  For this to be profitable, of course, you would have to have 
one secondary table already loaded when you tried to use the other; but that 
could happen.  So I agree that this would be a possible optimization today.

>> The purported benefit would be to avoid an additional virtual pointer load 
>> from the object in cases where consecutive calls are made to virtual 
>> functions introduced in different bases. However, it seems to me that cases 
>> where this is beneficial would be rare: not only would you need at least 
>> three bases and a derived class which does not override any of the called 
>> virtual functions, but when performing two consecutive calls it seems likely 
>> that the vtable would need to be reloaded anyway, either from the object or 
>> from the stack, especially with majority caller-save ABIs such as x86-64, or 
>> in any event because the first virtual call may have changed the object's 
>> dynamic type.

This part of your argument is weak.  Putting the v-table in a callee-save 
register would be quite reasonable if you're doing many repeat calls.  I don't 
see why it would matter whether the majority of registers are callee-save as 
long as the absolute number is at least 2; even i386 gives us 3 general-purpose 
callee-save registers, and x86-64 has 5.  And it's undefined behavior to change 
a pointer's dynamic type like that, although that can be tricky to take 
advantage of.

That said, I would say that the trade-offs still break in your favor here.  The 
optimization potential of this sort of contrived situation — calls to virtual 
methods of two different secondary v-tables — doesn't out-weigh the 
optimization potential of permitting non-standard organization of secondary 
v-tables.

>> It seems (according to experiments [3] carried out at godbolt.org 
>> <http://godbolt.org/>) that all major compilers (gcc, clang, icc) do already 
>> use the appropriate vtable group and therefore are compliant with the 
>> proposed restriction.
>> 
>> (There would also seem to be nothing preventing an implementation from 
>> choosing to load the RTTI pointer or offset-to-top from another virtual 
>> table group. However I would consider this even less likely to be beneficial 
>> than a virtual call via another virtual table.)

I agree, I cannot imagine why an optimizer would deliberately do this when it 
could get the same information from a simpler source.

>> The ABI specifies that the vtables in a group shall be laid out 
>> consecutively when referenced via a vtable group symbol, and I'm not 
>> proposing to change this. The effect of this proposal would be to allow a 
>> vtable to be split if the vtable group symbol is not referenced directly by 
>> name outside of the translation unit(s) participating in the optimization. 
>> This may be the case when a class has internal linkage, or if the program is 
>> linked with LTO, which allows the compiler to know which symbols are 
>> referenced outside of the LTO'd part of the program.
>> 
>> Wording
>> 
>> I propose to add two paragraphs to the section of the ABI describing virtual 
>> table groups, as follows:
>> 
>> diff --git a/abi.html b/abi.html
>> index 79cda2c..fce0c60 100644
>> --- a/abi.html
>> +++ b/abi.html
>> @@ -1193,6 +1193,18 @@ and again excluding primary bases
>>  (which share virtual tables with the classes for which they are primary).
>>  </ul>
>>  
>> +<p>
>> +When performing a virtual call or loading any other data from an address
>> +derived from the address point stored in an object's virtual table pointer,
>> +a program may only load from the virtual table associated with that address
>> +point, and not from any other virtual table in the same virtual table group
>> +which might be presumed to be located at a fixed offset from the address
>> +point as a result of the above layout algorithm.
>> +
>> +<p>
>> +The purpose of this restriction is to allow an implementation to split a
>> +virtual table group along virtual table boundaries if its symbol is not
>> +visible to other translation units.

I would say this more generally: the ABI does not make guarantees about the 
relative layout of v-tables in an object or a VTT.  It guarantees only the 
layout of the global symbol.  It does not guarantee that the v-table pointers 
actually installed in an object or a VTT will point into that global symbol.

John.

>>  
>>  <p>
>>  <a name="vtable-construction">
>> 
>> 
>> Thanks,
>> Peter
>> 
>> [0] http://lists.llvm.org/pipermail/llvm-dev/2016-January/094600.html 
>> <http://lists.llvm.org/pipermail/llvm-dev/2016-January/094600.html>
>> [1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html 
>> <http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html>
>> [2] https://mentorembedded.github.io/cxx-abi/abi.html#vcall.caller 
>> <https://mentorembedded.github.io/cxx-abi/abi.html#vcall.caller>
>> [3] https://godbolt.org/g/wX7Ay6 <https://godbolt.org/g/wX7Ay6> is a 
>> three-bases test case by Richard Smith, https://godbolt.org/g/7eG8A1 
>> <https://godbolt.org/g/7eG8A1> is a dynamic-type-known test case by me
>> _______________________________________________
>> cxx-abi-dev mailing list
>> [email protected] <mailto:[email protected]>
>> http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev
> 
> _______________________________________________
> cxx-abi-dev mailing list
> [email protected]
> http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev

_______________________________________________
cxx-abi-dev mailing list
[email protected]
http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev

Re: [cxx-abi-dev] Proposing an ABI restriction on loads from an object's vtable pointer

Reply via email to