The size differences are negligible because the code generator only emits GOT 
loads in narrow circumstances that reduce code size
- using pointer arithmetic on an address of an external symbol.
- if loading an address of an external symbol as function a argument using a 
push instruction.

Emitting GOT loads in these scenarios slightly reduces code size, but it forces 
the emission of the referenced GOT entry into the executable as well.  If no 
GOT entries are referenced by the code, they are discarded by the linker 
gc-sections feature.  When GOT entries are referenced from the code, they also 
get emitted.  So it's a tradeoff - if many GOT loads reference the same symbol 
- size is reduced.  If only one or two GOT loads reference a symbol, size may 
grow.
The number of such cases is also very small due to the narrow circumstances of 
the optimization opportunities.

In GCC49 toolchain with LTO off, there are no GOT loads today because of 
visibility pragma.
If visibility pragma is suppressed, I counted 6 cases in MdeModulePkg when 
building OvmfPkgX64.dsc.  This is what originally broke the build and caused 
the visibility pragma to be included.

In GCC5, the circumstances of GOT-load emission is further narrowed by the LTO. 
 It happens only when...
- An external symbol is defined in assembly (so it remains external to LTO).
- C code declares the external symbol and uses it in one of the narrow 
circumstances listed above where GOT loads reduce code size.

That is what is demonstrated in the sample.  There are no such cases in EDK2 
code base so GCC5 build doesn't break.

The size differences being negligible - the only reason this is an issue is 
that if a GOT load is emitted - it breaks the build since GenFw doesn't handle 
it.

So one option is to just ignore this since it doesn't happen in today's 
codebase, but since it can happen - document what the workarounds are:
- one workaround is to manually declare external symbols that cause GOT loads 
with __attribute__((visibility("hidden")))
- I've also found that using __attribute__((optimize("O2"))) on a function that 
emits GOT loads sometimes eliminates the GOT load.  This is because the GOT 
load is only emitted to reduce code size, so if changing optimization to speed 
- the GOT load is no longer used.

Another option is what is suggested by Ard Biesheuvael to arrange things so 
that all external symbols except module entry points are hidden.  This resolves 
the problem for GCC5 LTO build in the closest way similar to the resolution for 
GCC49 non-LTO build.

Another option is to add functionality to GenFw for handling the various X64 
GOTPCREL emissions for the small number of cases that are expected to occur.  
However, I cannot guarantee that future changes in the compiler will not start 
emitting thousands of GOT loads and this goes unnoticed because GenFw is 
handling them silently.  This is an undesirable scenario.

--------------------------------------------
On Wed, 6/13/18, Shi, Steven <steven....@intel.com> wrote:

... Does the hidden visibility in LTO can improve
 the LTO build code size? Is there any other benefit?
 
 Steven Shi
 Intel\SSG\STO\UEFI Firmware
 
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to