Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-19 Thread Cary Coutant
> As one of the strong advocates for the fix that was made to make
> protected visibility work correctly with data symbols, I'd like to
> explain why it was the right decision and why it matters. This whole
> process is really frustrating to me -- having invested a lot of effort
> into getting something important fixed, only to have people come
> trying to break it again -- but I'm going to try to be calm and not to
> snap at anybody.

Ironically, you've just described my feelings almost exactly, only I
come here finding that someone already broke it, and I'm trying to get
it fixed again.

With all due respect, I think you're misinterpreting what the
visibility feature was intended for, and you're projecting your own
needs onto everyone else that uses this feature. I can state with
first-hand knowledge that the intent was so that compilers could
optimize access to the protected data symbols based on an assumption
that they are "relatively nearby". Here's a quote from the gABI
proposal (the latest revision that I could find, dated April 16,
1999), which was submitted by Jim Dehnert, then at SGI:

"Optimization Note:

"The visibility semantics of these attributes allow various
optimizations. While care must be taken to maintain
position-independence and proper GOT usage for references to and
definitions of symbols which might be preempted by or referenced from
other components, these restrictions all allow references from the
same component to make stricter assumptions about the definitions.
References to protected symbols (and hence to hidden or internal
symbols) may be optimized by using absolute or PC-relative addresses
in executable files or by assuming addresses to be relatively nearby.
Internal functions (as defined in the MIPS ABI) do not normally
require gp establishment code even in psABIs requiring callee
establishment/restore of gp, because they will always be entered from
the same component with the correct gp already in place from the
caller."

Unfortunately, this optimization note didn't make it into the gABI,
probably because the editor felt it was unnecessary, and because it
contained some MIPS-specific details. Nevertheless, it clearly shows
the intent.

> From a programming standpoint, the semantics of protected visibility
> need to be free of arch-specific implementation details. Otherwise
> programmers can't use it without hard-coding arch-specific details,
> which for practical purposes, means good software can't use it at all.

It's unfortunate that copy relocations intrude on your stated goal,
but I'd prefer to fix that problem without breaking (and I do mean
"break") the original intent behind protected visibility.

> My original motivation for wanting protected visibility to "just work"
> was to be able to use:
>
> #pragma GCC visibility push(protected)
>
> around the inclusion of a library's public headers when including them
> from the implementation files, This is far from being the only usage
> case, and I'll expand on more important usage cases below, but it is
> an important one because it allows you to eliminate all GOT/PLT cost
> of intra-library function calls without any fine-grained maintenance
> of which declarations to apply visibility too (and without any
> GNUC-specific clutter in the header files themselves).
>
> I understand that some people want protected visibility to avoid the
> GOT for data symbols too for the sake of performance, but for my usage
> case, the fact that the semantics were wrong for data symbols meant
> that my configure check for "does protected visibility work" would
> return "no", and the whole optimization would get turned off.

Yes, some people do want protected visibility to avoid the GOT for
data symbols, and it makes a significant difference in many cases. For
those people, the changes I'm objecting to cause a performance
regression.

> Anyway, let's move past optimization, because it's a distraction.

I disagree. Optimization isn't a distraction -- it's the whole
motivation for the feature.

> After all, with the old (broken) behavior of protected data, one
> _could_ work around the above problem and still get the performance
> benefits for functions without breaking data by explicitly declaring
> all data with default visibility. In fact, this is how I solve the
> problem in musl libc, where there are only a small number of data
> symbols that should be externally accessible, and maintaining a list
> of them is managable:
>
> http://git.musl-libc.org/cgit/musl/tree/src/internal/vis.h?id=d1b29c2a54588401494c1a3ac7103c1e91c61fa1
>
> This is done for the sake of compatibility with a wide range of
> toolchains including ones with the old/broken behavior for protected
> data.

s/broken/correct/ :-)

Given that, now you can have efficient access to the symbols that
aren't externally accessible, and the symbols that are accessible are
marked correctly with default visibility, right? So what's the
problem? Why would you want to give up th

Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-19 Thread Cary Coutant
> Another old bug:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10908

Filed by you, and resolved (correctly) as invalid. Again, the real
problem was the lack of a linker diagnostic.

-cary


Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-19 Thread Cary Coutant
> Cary, please stop spreading the incorrect information.   There is
> at lease one GCC bug against protected symbol:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012
>
> which was reported by other people.

OK, so it got reported once by someone else. But that bug was based on
an incorrect understanding of protected visibility; the real problem
there was the lack of diagnostic from the linker.

-cary


Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-19 Thread Cary Coutant
>> So with all this it sounds that current protected visibility is just
>> broken and we should forgo with it, making it equal to default
>> visibility?
>
> Like how?  You mean in GCC regarding protected as default visibility?  No,
> that's just throwing out the baby with the water.  We should make
> protected do what it was intended to do and accept that not all invariants
> that are true for default visible symbols are also true for protected
> symbols, possibly by ...
>
>> At least I couldn't decipher a solution that solves all of the issues
>> with protected visibility apart from trying to error at link-time (or
>> runtime?) for the cases that are tricky (impossible?) to solve.
>
> ... this.

Right. Protected visibility worked fine without copy relocations for
15 years until HJ's patch. I don't know of anyone with a legitimate
complaint about that until HJ filed a bug based on his artificial test
case.

Other compilers implement protected the way it was intended, so the
linkers must still disallow copy relocations against protected
symbols, or we get legitimate complaints like these:

   https://sourceware.org/bugzilla/show_bug.cgi?id=15228
   https://sourceware.org/ml/binutils/2016-03/msg00312.html

-cary


Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-18 Thread Cary Coutant
>> That is why protected visibility is such a mess.
>
> Not mess, but it comes with certain limitations.  And that's okay.  It's
> intended as an optimization, and it should do that optimization if
> requested, and error out if it can't be done for whatever reason.

I completely agree.

> E.g. one limitation might very well be that function pointer comparison
> for protected functions doesn't work (gives different outcomes if the
> pointer is built from inside the exe or from a shared lib).  (No matter
> how it's built, it will still _work_ when called).  Alternatively we can
> make comparison work (by using the exe PLT slot), in which case Alans
> testcase will need more complications to show that protected visibility
> currently is broken.  Alans testcase will work right now (as in showing
> protected being broken) on data symbols.

Function pointer comparison is also a mess, and is the only reason why
the treatment for protected function symbols is so complicated. It all
boils down the the language guarantees that (a) the address of a
function must be unique, (b) that the address of a given function must
always be the same value, and (c) that these guarantees survive a
conversion to void*.

I'd argue that all of these language guarantees are poor choices. Just
like constant strings, which are allowed to be pooled, two identical
functions ought to be allowed to be pooled (folded). If function
pointer comparison were restricted to function pointer types, we could
allow the address of a function to yield the address of a PLT entry,
and use a deep comparison to decide whether two unequal function
pointers were in fact equivalent.

But that's another topic for another day.

-cary


Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]

2016-04-18 Thread Cary Coutant
>> Given a shared library that defines a variable, and a non-PIC
>> executable that references that variable, the linker makes a duplicate
>> of the variable in the executable .dynbss section and arranges to have
>> the copy initialized by the dynamic loader with a copy relocation.
>> .dynbss is a linker created section that becomes part of the
>> executable bss segment.  The idea is that at run-time both the
>> executable and the shared library will use the executable copy of the
>> variable.  It's a rather ancient linker hack to avoid dynamic text
>> relocations, invented well before symbol visibility.
>
> So what other choice does the linker have here?  AFAICS it's wrong
> to create the .dynbss copy for protected symbols.  So why not
> simply create 'dynamic text relocations' then?  Is that possible with
> a pure linker change?

Ugh. Besides being a bad idea from a performance point of view, it's
not even always possible to do. Depending on the architecture, a
direct reference from an executable to a variable in a shared library
may not have the necessary reach.

> That said, correctness trumps optimization.  A correctness fix that
> works with old objects trumps one that requires a compiler change.
> Requiring a compiler change to get back optimization while preserving
> correctness is fine.

When the whole point of a feature is to enable a particular
optimization, the missed optimization *is* a correctness issue.

Symbol visibility is not a standard language feature. It's an
extension that many compilers provide to give access to an ELF/gABI
feature, and it comes with limitations. When the only way to eliminate
those limitations is to disable the intended optimization, the only
real choices are to live with the limitations (i.e., issue an error
when we would need a COPY relocation for a protected symbol), or to
forgo the extension altogether.

-cary


Re: [GCC Wiki] Update of "DebugFission" by CaryCoutant

2016-03-21 Thread Cary Coutant
We're in the final stages of finalizing the DWARF v5 spec, which will
include the Fission extensions. When that's done, we'll start
converting GCC and gold over to use the official DWARF features rather
than the GNU extensions.

>> + The "Fission" project was started in response to the problems caused by 
>> huge amounts of debug information in large applications. By splitting the 
>> debug information into two parts at compile time -- one part that remains in 
>> the .o file and another part that is written to a parallel .dwo ("DWARF 
>> object") file -- we can reduce the total size of the object files processed 
>> by the linker.
>
> Yay, a quite noticeable link-time speedup!  \o/

Good!

>> + Fission is implemented in GCC 4.7, and requires support from recent 
>> versions of objcopy and the gold linker.
>
> Is my understanding correct that the gold linker is not actually a
> requirement -- at least nowadays?  In my (very limited, so far) testing,
> this also seems to work with ld.bfd.  (I do see objcopy's --extract-dwo
> and --split-dwo options being used in gcc/gcc.c:ASM_FINAL_SPEC, so I
> suspect that's what "recent versions of objcopy" hints at?)

Yes, that's what we need from objcopy.

The reason gold is needed is for the --gdb-index option. Without the
.gdb_index section, gdb has to load all the .dwo files at startup so
that it knows all the symbols; with the .gdb_index section, it can
find symbols in the index and load the necessary .dwo file(s) on
demand. On a large project, I'd still expect absence of an index to be
painful enough that I'd leave that requirement in the wiki.

>> + Use the {{{-gsplit-dwarf}}} option to enable the generation of split DWARF 
>> at compile time. This option must be used in conjunction with {{{-c}}}; 
>> Fission cannot be used when compiling and linking in the same step.
>
> According to the following -- admittedly very minimal -- testing, this is
> not actually (no longer?) true?
>
> $ [gcc] [...] -gsplit-dwarf
> $ ls *.dwo
> ccF9JYjE.dwo  subroutines.dwo
> $ gdb -q a.out
> Reading symbols from a.out...done.
> (gdb) list main
> [...]
> (gdb) quit
> $ rm *.dwo
> $ gdb -q a.out
> Reading symbols from a.out...
> warning: Could not find DWO CU subroutines.dwo(0x2d85cdd539df6900) 
> referenced by CU at offset 0x0 [in module [...]/a.out]
>
> warning: Could not find DWO CU ccF9JYjE.dwo(0xa6936555a636518) referenced 
> by CU at offset 0x35 [in module [...]/a.out]
> done.
> (gdb) list main
> warning: Could not find DWO CU subroutines.dwo(0x2d85cdd539df6900) 
> referenced by CU at offset 0x0 [in module [...]/a.out]
>
> Have I been testing the wrong thing?

Hmmm, I think the problem here is that the .dwo files corresponding to
the temporary .o files will never get cleaned up, and might even get
overwritten unpredictably. It's just not a fully-supported path. I may
have forgotten some other issues. I may have also been expecting them
to get automatically cleaned up when the .o was removed.

>> + Use the gold linker's {{{--gdb-index}}} option ({{{-Wl,--gdb-index}}} when 
>> linking with gcc or g++) at link time to create the .gdb_index section that 
>> allows GDB to locate and read the .dwo files as it needs them.
>
> Unless told otherwise, I'll re-word that to the effect that gold, and
> usage of its --gdb-index option are optional.

Maybe "highly recommended" would be better.

-cary


Re: RFA: Add GCC Runtime Library Exception to include/plugin-api.h

2016-02-02 Thread Cary Coutant
> include/plugin-api.h defines an ABI between linker and compiler,
> which can be used to implement linker plug-in by any compilers.
> I'd like to add GCC Runtime Library Exception to include/plugin-api.h
> so that the linker plug-in can have non-GPL licenses.

This is OK with me.

-cary


Re: [RFC] MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking

2015-11-14 Thread Cary Coutant
> 3.3.2 Static Linking Object Acceptance Rules
>
>  The static linker shall follow the user selection as to the linking mode
> used, either of `strict' and `relaxed'.  The selection will be made
> according to the usual way assumed for the environment used, which may be
> a command-line option, a property setting, etc.
>
>  In the `strict' linking mode both `strict' and `legacy' objects can be
> linked together.  All shall follow the same legacy-NaN or 2008-NaN ABI, as
> denoted by the EF_MIPS_NAN2008 flag described in Section 3.1.  The value
> of the flag shall be the same across all the objects linked together.  The
> output of a link involving any `strict' objects shall be marked as
> `strict'.  No `relaxed' objects shall be allowed in the same link.
>
>  In the `relaxed' linking mode any `strict', `relaxed' and `legacy'
> objects can be linked together, regardless of the value of their
> EF_MIPS_NAN2008 flag.  If the flag has the same value across all objects
> linked, then the value shall be propagated to the binary produced.  The
> output shall be marked as `relaxed'.  It is recommended that the linker
> provides a way to warn the user whenever a `relaxed' link is made of
> `strict' and `legacy' objects only.

This paragraph first says that "If the flag has the same value across
all objects linked, then the value shall be propagated to the binary
produced", but then says the "output shall be marked as `relaxed'."
Are you missing an "Otherwise" there?

Early on in the document, you mention "this applies regardless of
whether it relies on the use of NaN data or IEEE Std 754 arithmetic in
the first place," yet your solution is only two-state. Wouldn't it be
better to have a three-state solution where objects that do not in
fact rely on the NaN representation at all can be marked as "don't
care"? Such objects could always be mixed with either strict or
relaxed objects, regardless of linking mode.

-cary


Re: Adding static-PIE support to binutils

2015-08-18 Thread Cary Coutant
> Does static pie (ET_DYN with non-fixed load address, _DYNAMIC
> relocations, but no PT_INTERP, DT_NEEDEDs, or symbolic relocations)
> currently work with gold? If so, what is the way to request it? I
> would say it would make sense to try to do things the same, but from
> what you're saying it sounds like gold already significantly
> mismatches the bfd linker behavior...?

No, currently we reject -static and -pie when used together. But it
could be supported.

I don't think the differences are that significant in practice. In
common usage, gold and Gnu ld end up doing the same thing.

-cary


Re: Adding static-PIE support to binutils

2015-08-18 Thread Cary Coutant
> This is OK to commit with a suitable ChangeLog.  I think a separate ld
> option is best too, because historically -static and its aliases
> -Bstatic, -dn, -non_shared really are about what type of libraries are
> accepted rather than choosing linker output type.

Gold actually separates these concepts: -Bstatic/-dn and -Bdynamic/-dy
are about what kinds of libraries to search for, while -static and
-shared/-Bshareable/-G determine what kind of output to produce.

-cary


Re: Adding static-PIE support to binutils

2015-08-17 Thread Cary Coutant
> So far, I've been prototyping static PIE support by having GCC pass
> the following options to ld instead of -static -pie:
>
> -static -shared -Bsymbolic
>
> This partly works, but since ld does not know it's producing a main
> executable, it misses important details, including the ability to link
> initial-exec and local-exec model TLS code correctly, as well as
> various linking optimizations. So I think the right way forward is
> making ld accept -static and -pie together to do the right thing.

For the uses you have in mind, -static and -pie together make perfect
sense, but I'd argue that the output file in that case ought to be
ET_EXEC, since it will be in fact a standalone binary to be loaded
directly by the kernel. Not only would you want to omit the .interp
section (actually the PT_INTERP segment), but you also have no need
for the PT_DYNAMIC segment (.dynamic section).

The only thing you need over a standard ET_EXEC file is the dynamic
relocations, with linker-generated symbols bracketing the start and
end of the relocations so that your custom startup code can find them.
It should be reasonably easy to arrange for these.

-cary


Re: GCC Cauldron: Notes from the C++ ABI BOF

2013-01-22 Thread Cary Coutant
>> Normally, the version identifier is applied to a type. It then
>> propagates to any declaration using that type, whether it's another
>> type or function or variable. For struct/union/class types, if any
>> member or base class has an attached version identifier (excluding
>> static data members, static member functions, and non-virtual member
>> functions), we attach the version identifier to the enclosing type.
>
> How does this handle incomplete types?  When we see a forward declaration of
> a class, we don't know its member/base types, so we can't propagate.

I believe we required an explicit attribute on the forward declaration
in such a case. The compiler would complain if the version_id on the
forward declaration didn't match that of the definition, allowing us
to catch these cases at compile time (at least if the forward
declaration and the definition are ever visible in the same
translation unit). In practice, this was a mechanism of last resort,
and it was used infrequently enough (I think struct utsname may have
been the only type we were using it for when I left) that the known
loopholes were not a real concern. (Other known loopholes included
assembly code, Fortran, and typecasting.)

-cary


Re: stabs support in binutils, gcc, and gdb

2013-01-14 Thread Cary Coutant
>> Next, I compiled a 5000-line C++ source file at both -O0 and -O2.
>
> I have to assume that David is working with C code, as stabs debugging
> for C++ is nearly unusable.

I assumed that too, but I figured C++ would be worse than C as far as
DWARF vs. stabs size. I'd still be interested to figure out what's
causing that 11.5x expansion.

-cary


Re: stabs support in binutils, gcc, and gdb

2013-01-11 Thread Cary Coutant
>> If I use objcopy --compress-debug-sections to compress the DWARF debug
>> info (but don't use it on the STABS debug info), then the file size
>> ratio is 3.4.
>>
>> While 3.4 is certainly better than 11.5, unless I can come up with a
>> solution where the ratio is less than 2, I'm not currently planning on
>> trying to convince them to switch to DWARF.
>
> The 3.4 number is the number I was interested in.
> Thanks for computing it.

It's not really fair to compare compressed DWARF with uncompressed stabs, is it?

> There are other things that can reduce the amount of dwarf, but the
> size reduction can depend on the app of course.
> I'm thinking of dwz and .debug_types.
> I wonder what 3.4 changes to with those applied.

David already said that dwz didn't help much, so that implies that
.debug_types won't help much either -- dwz should have removed any
duplicate type information that .debug_types would have removed.

I'm not going to argue that a ratio of 11.5 isn't kind of embarrassing
for DWARF, but I'd like to point out that you're not making an
apples-to-apples comparison. DWARF expresses a lot more about what's
going on in your program than stabs does, and it's reasonable to
expect it to be larger as a result. I compiled a very small C++ source
file with nothing more than a simple class definition and a main that
instantiates an instance of the class. Compiled with stabs, the .o
file is 3552 bytes with 1843 bytes of stabs info. Compiled with
DWARF-4, the .o file is 3576 bytes with 668 bytes of DWARF. For this
file, the two formats are encoding roughly the same information, and
DWARF is actually more efficient.

Next, I compiled a 5000-line C++ source file at both -O0 and -O2.
Here's the comparison at -O0:

stabs:   2,179,240 total   562,931 debug
dwarf:   4,624,816 total (2.1x)  1,965,448 debug (3.5x)

And at -O2:

stabs:   1,249,552 total   511,957 debug
dwarf:   4,612,240 total (3.7x)  2,281,564 debug (4.5x)

In general, DWARF is describing more about where variables live as
they move around during program execution. There's been lots of recent
work improving GCC's support for debugging optimized code, and that's
expensive to describe. Notice that when we turn on -O2, we get a lot
more DWARF information, while the stabs info is actually a bit smaller
(probably because -O2 generates less code). Even at -O0, DWARF is
describing more than stabs is.

I didn't see anything close to the 11.5 ratio that David got, so I'm
not sure what's so exceptional about your case. I'd be happy to take a
look if you can get me the files somehow.

We're working hard at improving the efficiency of DWARF -- there's a
lot of places where it can be improved, but I doubt the ratio between
stabs and DWARF will ever be much lower than ~3x, simply because
there's so much more information contained in the DWARF.

That extra information leads to a better debugging experience, but
it's a tradeoff. If stabs gives you a good-enough experience and the
size of DWARF is unbearable for you, then there's no reason to switch.

-cary


Re: GCC Cauldron: Notes from the C++ ABI BOF

2013-01-10 Thread Cary Coutant
> We had a useful discussion about C++11 ABI issues at the GNU Tools
> Cauldron (http://gcc.gnu.org/wiki/cauldron2012).  The approach will be
> shaped over time, but the general idea is as follows.
>
> We will modify g++ to support a type attribute indicating the version
> of the type, as a string.  This type attribute will be inherited by
> any other type that uses it, as a class/struct member or via
> inheritance.  Type attributes will be concatenated as needed.  This
> type attribute will then be used in the mangled name of any function
> that takes a parameter of a type with an attribute or returns a type
> with an attribute.  The type attribute will also be used in the
> mangled name of any global variable whose type has an attribute.

I (finally) managed to dig up some memories on this topic. Here's a
rough summary of how it works on HP-UX:

We added one new Gnu-style attribute:

   __attribute_((version_id("version_identifier")))

This attribute can be attached to a typedef, struct/union/class,
function, or variable declaration, or to function and variable
definitions. If a variable is redeclared in the same source file, its
version identifier must match in both declarations.

Normally, the version identifier is applied to a type. It then
propagates to any declaration using that type, whether it's another
type or function or variable. For struct/union/class types, if any
member or base class has an attached version identifier (excluding
static data members, static member functions, and non-virtual member
functions), we attach the version identifier to the enclosing type. If
there is more than one, we use the lexically greater identifier. For
arrays, pointer and reference types, and typedefs, the version
identifier propagates from the base type to the derived type. For
pointer-to-member, if either the base class or the base type has a
version identifier, the new type gets it.

For global variables, if the type has a version identifier, the linker
name of the variable gets decorated as
"source_name{version_identifier}". For functions, if the return type
or any of the formal parameter types have version identifiers, the
linker name of the function gets decorated the same way. Non-static
member functions will also inherit the version identifier of the base
class (because of the implicit "this" parameter).

If a variable or function gets a version identifier applied to it
directly, but also inherits one from its type, we use the lexically
greater.

-cary


Re: Reserving a bit in ELF segment flags for huge page mappings

2012-07-24 Thread Cary Coutant
>   To do this, I would like to reserve a bit in the segment flags to
> indicate that this segment is to be mapped to huge pages if possible.
> Can I reserve something like a PF_LARGE_PAGE bit?

HP-UX has a PF_HP_PAGE_SIZE (0x0010) bit that says "Segment should
be mapped with page size specified in p_align field".

-cary


GCC Cauldron: Notes from the C++ ABI BOF

2012-07-11 Thread Cary Coutant
string & list changes

- function w/ string arg
- classes w/ string members
- global variables

- link c++98 & c++0x together
- link existing c++98 .o & new c++-x .o
- link at runtime c++98 .so & new c++0x .so

HP: attribute on type propagates to:
- function (params or return type)
- structs w/ members
- global vars
"Function-level versioning"
User-level docs:
http://h21007.www2.hp.com/portal/download/files/prot/files/linker/B2355-91150.pdf
(I'll try to dig up more details.)

Sun: change all manglings

(1) Add type attributes to g++
(2) Add type attributes to libstdc++
(3) Add API text generation to g++
(4) Use API text in libstdc++ to find changes


Re: [patch][rfc] How to handle static constructors on linux

2012-06-18 Thread Cary Coutant
> But I am still missing something, why is the performance so different?
> Code layout putting the constructors' body in the reverse order they
> are called?

Yes, as I understand it. Cache and TLB prefetching works better when
code executes from lower to higher addresses than when executing from
higher to lower.

-cary


Re: [patch][rfc] How to handle static constructors on linux

2012-06-18 Thread Cary Coutant
>> Furthermore, if you're working in chromium, you should be aware that
>> the new behavior is exactly what the Chrome developers are arguing
>> for, as it makes the startup faster. It sounds to me like you're
>> working at cross purposes with the other developers on that project.
>
> Ah, perhaps this goes to answer my question from the other mail: why
> switch to '.init_array'?

This has a long and complicated history. I tried to explain some of that here:

   http://gcc.gnu.org/ml/gcc-bugs/2010-12/msg01493.html

I wasn't part of the GCC community at the time, but I think that
.ctors was originally used instead of .init or .init_array precisely
because the order of execution of .init/.init_array was backwards from
the desired order of execution for constructors (leaving aside the
fact that it was backwards from the desired order of execution for
*any* kind of initializer). Now that GCC has finally moved from .init
to .init_array, they're simply trying to consolidate on the One True
Initializer Mechanism. In doing so, it would be desirable to correct
that mistake we made so long ago in the gABI, but that's where we ran
up against the concerns of the Chrome and Firefox developers who care
more about startup performance than about constructor ordering (but,
apparently, not enough to use linker options to reorder the code in
order to get both good performance *and* proper execution order).

-cary


Re: [patch][rfc] How to handle static constructors on linux

2012-06-18 Thread Cary Coutant
> So this is not as bad as I was expecting (old programs still work),
> but it is still a somewhat annoying ABI change to handle. I think we
> can add support for this in clang in 3 ways:
>
> 1) Require new linkers when using gcc 4.7 libraries.
> 2) Ship our own versions of crtbeginS.o (and similars).
> 3) Use .init_array when using gcc 4.7 libraries.
>
> I have most of option 3 implemented. Chandler,  do you still think
> that this is a big enough ABI breakage that it should not be
> supported?

You keep using the terms "ABI change" and "ABI breakage", but I think
you're using those terms a little too freely. The ABI does not specify
the order of initializers across compilation units, so the difference
in behavior -- while perhaps unfriendly -- does not even qualify as an
ABI change, and certainly not as an ABI "breakage".

Furthermore, if you're working in chromium, you should be aware that
the new behavior is exactly what the Chrome developers are arguing
for, as it makes the startup faster. It sounds to me like you're
working at cross purposes with the other developers on that project.

If the change in behavior is breaking the protocol compiler, I think
the best solution would be to fix it so that it doesn't depend on
behavior that is explicitly undefined by the ABI. Otherwise, it's
going to be fragile and non-portable.

Having said all that, I'm still very much in favor of reversing the
.init_array, as I believe that makes much more sense than the current
behavior. It does, however, change existing behavior for (non-ctor
uses of) .init_array, so there will be changes in behavior either way.
The only way to preserve the exact status ante is to disable GCC from
placing ctors in .init_array and to use the linker's
--no-ctors-in-init-array option.

Sorry, I have not yet implemented the --reverse-init-array option in gold.

-cary


Re: RFC: Add STB_GNU_SECONDARY

2012-04-20 Thread Cary Coutant
> We only have very few bits to in STB_XXX field.

This is exactly why I'm not in favor of this extension. The feature
doesn't seem compelling enough to use up one of these precious
reserved values (in fact, you're using the next-to-last one that's
reserved for OS use).

You want a backup definition? Put a weak def at the end of the link line.

-cary


Re: Debug info for comdat functions

2012-04-18 Thread Cary Coutant
> This seems clearly wrong to me.  A reference to a symbol in a discarded
> section should not resolve to an offset into a different section.  I thought
> the linker always resolved such references to 0, and I think that is what we
> want.

Even resolving to 0 can cause problems. In the Gnu linker, all
references to a discarded symbol get relocated to 0, ignoring any
addend. This can result in spurious (0,0) pairs in range lists. In
Gold, we treat the discarded symbol as 0, but still apply the addend,
and count on GDB to recognize that the function starting at 0 must
have been discarded. Neither solution is ideal. That's why debug info
for COMDAT functions ought to be in the same COMDAT group as the
function...

>> When discussed on IRC recently Jason preferred to move the
>> DW_TAG_subprogram
>> describing a comdat function to a comdat .debug_info DW_TAG_partial_unit
>> and just reference all DIEs that need to be referenced from it
>> using DW_FORM_ref_addr back to the non-comdat .debug_info.
>
> I played around with implementing this in the compiler yesterday; my initial
> patch is attached.  It seems that with normal DWARF 4 this can work well,
> but I ran into issues with various GNU extensions:

Nice -- I've been wanting to do that for a while, but I always thought
it would be a lot harder. I see that you've based this on the
infrastructure created for -feliminate-dwarf2-dups. I don't think that
will play nice with -fdebug-types-section, though, since I basically
made those two options incompatible with each other by unioning
die_symbol with die_type_node.

In the HP-UX compilers, we basically put a complete set of .debug_*
sections in each COMDAT group, and treated the group as a compilation
unit of its own (not a partial unit). That worked well, and avoided
some of the problems you're running into (although clearly is more
wasteful in terms of object file size). Readelf and friends will need
to be taught how to find the right auxiliary debug sections, though --
they currently have a built-in assumption that there's only one of
each.

-cary


Re: Dealing with compilers that pretend to be GCC

2012-01-20 Thread Cary Coutant
> Yeah, but it’s a shame that those compilers define __GNUC__ without
> supporting 100% of the GNU C extensions.  With this approach, you would
> also need to add !defined for Clang, PGI, and probably others.

Having worked on the other side for a while -- for a vendor whose
compiler supported many but not all of GCC's extensions -- I claim
that the problem is with the many examples of code out there that
blindly test for __GNUC__ instead of testing for individual
extensions. From the other vendor's point of view, it's nearly useless
to support any of the GCC extensions if you don't also define
__GNUC__, because most code out there will simply test for that macro.
By defining the macro even if you don't support, for example, nested
functions, you can still compile 99% of the code that uses the
extensions.

-cary


Re: Getting rid of duplicate .debug_ranges

2012-01-20 Thread Cary Coutant
>> Is there a way to detect that basic blocks have the same range even
>> though they have different block numbers? Or am I not looking/thinking
>> about this issue correctly?

I may be oversimplifying this, but it seems that
gen_inlined_subroutine_die generates a DW_AT_ranges list, then calls
decls_for_scope, which results in a call to gen_lexical_block_die,
which generates another DW_AT_ranges list, but in both cases,
BLOCK_FRAGMENT_CHAIN(stmt) points to the same list of block fragments.
I'd think you could just keep a single-entry cache in
add_high_low_attributes that remembers the last value of
BLOCK_FRAGMENT_CHAIN(stmt) and the pointer returned from add_ranges
(stmt) for that chain. If you get a match, just generate a
DW_AT_ranges entry using the range list already generated.

-cary


[RFC] New git-only branch for Fission project

2011-10-19 Thread Cary Coutant
I'd like to create a new "fission" branch on the GIT mirror at
ssh://gcc.gnu.org/git/gcc.git. This branch will host the changes that
Sterling and I will be making to support splitting the debug info into
separate files -- and, as a byproduct, fixing the pubnames and
pubtypes tables so they can be used to create a gdb index. A while
ago, I posted a summary of the Fission project on the GCC wiki:

   http://gcc.gnu.org/wiki/DebugFission

I expect we'll be ready to merge our work into trunk when Stage 1
opens for GCC 4.8.

Any objections? Is it OK to make a git-only branch?

-cary


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread Cary Coutant
> The Apple approach has both the features of the Sun/HP implementation as well 
> as the ability to create a standalone debug info file.

Thanks for the clarifications. I based my comments on a description
you sent me a couple of years ago, and I apologize for any
oversimplifications I introduced.

> The compiler puts DWARF in the .o file, the linker adds some records in the 
> executable which help us to understand where files/function/symbols landed in 
> the final executable[1].

Did you intend to add a footnote?

>  If the user runs our gdb or lldb on one of these binaries, the debugger will 
> read the DWARF directly out of the .o files on the fly.  Because the linker 
> doesn't need to copy around/update/modify the DWARF, link times are very 
> fast.  If the developer decides to debug the program, no extra steps are 
> required - the debugger can be started up & used with the debug info still in 
> the .o files.

We're trying to achieve something very similar, but we have the
additional goal of separating the info from the .o files because of
our distributed build environment. I also wanted to attempt to
standardize the approach, instead of having each vendor go in separate
directions.

Thanks,

-cary


Re: RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-23 Thread Cary Coutant
>> * .debug_pubtypes - Public types for use in building the
>>   .gdb_index section at link time. This section will have an
>>   extended format to allow it to represent both types in the
>>   .debug_dwo_info section and type units in .debug_types.
>    ^^^
>    = .dwo_info , maybe both .debug_info and .dwo_info
>
>
>> * .dwo_abbrev - Defines the abbreviation codes used by the
>>   .debug_dwo_info section.
>    ^^^
>    = .dwo_info

Thanks, I've fixed the wiki page.

> I find this .dwo_* setup is great for rapid development rebuilds but it should
> remain optional as the currently used DWARF final separate .debug info file is
> smaller than all the .dwo files together.  In the case of the final linked
> .debug builds (rpm/deb/...) one does not consider the build speed as 
> important.
> It probably does not make sense to merge + convert .dwo files back to a single
> .debug file for the rpm/deb/... build performance reasons.

Yes, we'll definitely make this a compile-time option.

While I haven't finished designing the package format for collecting
all the .dwo files, I do plan on having the packaging tool do at least
duplicate type elimination to reduce the size of the package file.

-cary


RFC: DWARF Extensions for Separate Debug Info Files ("Fission")

2011-09-22 Thread Cary Coutant
At Google, we've found that the cost of linking applications with
debug info is much too high. A large C++ application that might be,
say, 200MB without debug info, is somewhere around 1GB with debug
info, and the total size of the object files that we send to the
linker is around 5GB (and that's with compressed debug sections).
We've come to the conclusion that the most promising solution is to
eliminate the debug info from the link step. I've had direct
experience with HP's approach to this, and I've looked into Sun's and
Apple's approaches, but none of those three approaches actually
separates the debug info from the non-debug info at the object file
(.o) level. I know we're not alone in having concerns about the size
of debug info, so we've developed the following proposal to extend the
DWARF format and produce separate .o and ".dwo" (DWARF object) files
at the compilation step. Our plan is to develop the gcc and gdb
changes on new upstream branches.

After we get the basics working and have some results to show
(assuming it all works out and proves worthwhile), I'll develop this
into a formal proposal to the DWARF committee.

I've also posted this proposal on the GCC wiki:

   http://gcc.gnu.org/wiki/DebugFission

We've named the project "Fission."

I'd appreciate any comments.

-cary


DWARF Extensions for Separate Debug Information Files

September 22, 2011


Problems with Size of the Debug Information
===

Large applications compiled with debug information experience
slow link times, possible out-of-memory conditions at link time,
and slow gdb startup times. In addition, they can contribute to
significant increases in storage requirements, and additional
network latency when transferring files in a distributed build
environment.

* Out-of-memory conditions: When the total size of the input
  files is large, the linker may exceed its total memory
  allocation during the link and may get killed by the operating
  system. As a rule of thumb, the link job total memory
  requirements can be estimated at about 200% of the total size
  of its input files.

* Slow link times: Link times can be frustrating when recompiling
  only a small source file or two. Link times may be aggravated
  when linking on a machine that has insufficient RAM, resulting
  in excessive page thrashing.

* Slow gdb startup times: The debugger today performs a partial
  scan of the debug information in order to build its internal
  tables that allow it to map names and addresses to the debug
  information. This partial scan was designed to improve startup
  performance, and avoids a full scan of the debug information,
  but for large applications, it can still take a minute or more
  before the debugger is ready for the first command. The
  debugger now has the ability to save a ".gdb_index" section in
  the executable and the gold linker now supports a --gdb-index
  option to build this index at link time, but both of these
  options still require the initial partial scan of the debug
  information.

These conditions are largely a direct result of the amount of
debug information generated by the compiler. In a large C++
application compiled with -O2 and -g, the debug information
accounts for 87% of the total size of the object files sent as
inputs to the link step, and 84% of the total size of the output
binary.

Recently, the -Wa,--compress-debug-sections option has been made
available. This option reduces the total size of the object files
sent to the linker by more than a third, so that the debug
information now accounts for 70-80% of the total size of the
object files. The output file is unaffected: the linker
decompresses the debug information in order to link it, and
outputs the uncompressed result (there is an option to recompress
the debug information at link time, but this step would only
reduce the size of the output file without improving link time or
memory usage).


What's All That Space Being Used For?
=

The debugging information in the relocatable object files sent to
the linker consists of a number of separate tables (percentages
are for uncompressed debug information relative to the total
object file size):

* Debug Information Entries - .debug_info (11%): This table
  contains the debug info for subprograms and variables defined
  in the program, and many of the trivial types used.

* Type Units - .debug_types (12%): This table contains the debug
  info for most of the non-trivial types (e.g., structs and
  classes, enums, typedefs), keyed by a hashed type signature so
  that duplicate type definitions can be eliminated by the
  linker. During the link, about 85% of this data is discarded as
  duplicate. These sections have the same structure as the
  .debug_info sections.

* Strings - .debug_str (25%): This table contains strings that
  are not placed inline in the .debug_info and .debug_types
  sections. The linker merges the string ta

Re: GCC 4.4/4.6/4.7 uninitialized warning regression?

2011-04-20 Thread Cary Coutant
> This brings out 2 questions.  Why don't GCC 4.4/4.6/4.7 warn it?
> Why doesn't 64bit GCC 4.2 warn it?

Good question. It seems that the difference is whether the compiler
generates a field-by-field copy or a call to memcpy(). According to
David, the trunk gcc in 32-bit mode doesn't call memcpy, but still
doesn't warn. He's looking at it.

-cary


Re: DW_AT_GNU_odr_signature

2011-04-04 Thread Cary Coutant
> I saw that dwarf2out.c (generate_type_signature) does not just calculate
> the complete type signature for use with DW_AT_signature, but also
> outputs a DW_AT_GNU_odr_signature. The comment says:
>
> /* First, compute a signature for just the type name (and its
>   surrounding context, if any.  This is stored in the type unit DIE
>   for link-time ODR (one-definition rule) checking.  */
>
> I couldn't find where in the toolchain this link-time checking is done.
> Does anybody have any pointers?

It's not yet implemented. I plan to implement this checking in gold
(as a supplement to the existing --detect-odr-violations flag, which
is based on line number table information) at some point, but it's a
low priority.

-cary


Re: PATCH: 2 stage BFD linker for LTO plugin

2011-01-19 Thread Cary Coutant
>> I'm not sure if with your patch the add_input_library or
>> add_input_file plugin hooks are completely useless (and thus
>> gold could simply ignore those at all).
>
> The plugin does need to use the add_input_file callback.  In any case
> I'm not sure it's a great idea for gold to ignore a hook, there might be
> some need for it in the future.

With this patch, the plugin won't need add_input_library, which was
added to support the pass-through option. It still needs
add_input_file, which is how the plugin inserts the newly-compiled
objects.

-cary


Re: PATCH: 2 stage BFD linker for LTO plugin

2010-12-13 Thread Cary Coutant
> Here is an alternative proposal, with a patch for gold.
>
> We add a new plugin vector: LDPT_REGISTER_RESCAN_ARCHIVE_HOOK.  Like
> LDPT_REGISTER_CLAIM_FILE_HOOK, this gives the plugin the address of a
> function which can register a plugin function: rescan_archive.
>
> typedef
> enum ld_plugin_status
> (*ld_plugin_rescan_archive_handler) (
>  const struct ld_plugin_input_file *file, int *rescan);
>
> If the plugin registers this hook, the linker will call the hook for
> each archive that it sees.  If the hook sets the *rescan variable to a
> non-zero value, then the linker will rescan the archive after calling
> the all_symbols_read hook.  The archive will be rescanned using the same
> position dependent options as when it was originally scanned.  In
> particular, if the archive occurred within --start-group/--end-group,
> the entire group will be rescanned.
>
> The point of this patch is that the known problems with the LTO plugin
> are when the plugin introduces a previously unknown symbol which must be
> satisfied from some archive.  The new symbol is introduced when the
> all_symbols_hook read hook calls add_input_file to add a new object file
> to the link, and the new object file refers to the new symbol.  Since
> the symbol was not previously seen, if the definition should come from
> an archive, the archive will not have been searched.  Hence the test
> case in GCC PR 42690 comment #32.
>
> Fortunately, we know that gcc is not going to introduce arbitrary new
> symbol references.  The current system hacks around the problem by using
> a special -pass-through option which directs the plugin to add specific
> archives to the link, namely -lc and -lgcc.  That works for most cases,
> but not when gcc introduces a symbol which comes from -lm.  Still,
> despite the -lm issue, we do not have to concern ourselves with
> arbitrary archives, because gcc is not going to act arbitrarily.
>
> Also we know that the problem can only occur with archives, not with
> shared objects.
>
> The rescan_archive proposal fixes the problems and obviates the need to
> use -pass-through.  It avoids the unnecessary cost of a complete relink.
>
> I've appended a patch for gold which implements this proposal.  I've
> also appended a patch for lto-plugin.  This patched lto-plugin does not
> use -pass-through when rescan_archive is available.  It rescans libc.a,
> libgcc.a, and libm.a.  It handles the PR 42690 comment #32 test case
> correctly.
>
> Any thoughts on this approach?

Looks good to me. I'd still prefer something based on a list of symbol
names that the backend might introduce calls to, but I'll concede that
this is far more practical.

I think Dave mentioned in the other thread that libgcov, libssp, and
libmudflap might also need to be rescanned:

>>> My suspicion is that the LTO plugin can only introduce a small bounded
>>> set of new symbol references, namely those which we assume can be
>>> satisified from -lc or -lgcc.  Is that true?
>>
>> Exactly.
>
> Potentially also gcov, ssp, mudflap?

Should we mark the pass-through option in lto-plugin as obsolescent?

-cary


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-07 Thread Cary Coutant
> Here is my proposal.  Any comments?

We talked about ld -r a while back during the WHOPR project, and the
two ways that the linker could work: (1) combine all the .o files and
use the plugin to run LTRANS on the IR files, producing a pure,
optimized, object file; and (2) combine the non-IR object files as ld
-r normally would, and combine that result somehow with the IR from
the other files, for later optimization. If I remember correctly,
there was support for both modes of operation. The first mode is
easily handled with the current design (untested as far as I know --
there are probably bugs, and I'm not sure if we get the symbol
visibility correct in those cases).

The second mode corresponds with your proposal here. It's complicated
by the fact that it's difficult to tell, once the objects are
combined, which compiled code came without corresponding IR. For this,
I've got a suggestion that seems a bit simpler than your
".objectonly\004" section, based on an idea for something completely
unrelated[1] that I've been pondering over for a while. Instead of
embedding the non-IR objects into the mixed object file, let's instead
produce an archive file with several members: one that contains the
result of running ld -r on the non-IR objects in the link, and one
member for each of the IR files (alternatively, exactly one member
that contains the result of running ld -r on all of the IR objects).
In order to make the archive such that a subsequent link loads all of
the members unconditionally, I propose to add a special symbol
".FORCE" into the archive symbol table for each member; when the
linker sees that symbol in the archive symbol table, it will load the
corresponding member unconditionally.

>   ○ Object-only section:
>   § Section name won't be generated by any tools, something like
>".objectonly\004".
>   § Contains non-IR object file.
>   § Input is discarded after link.

Please -- use a special section type, not a magic name.

-cary


[1] My unrelated idea is about "__attribute__ (( used ))" -- when a
symbol is marked as used, it should not only suppress unused warnings
in the compiler, but it should also force the resulting object module
to be linked from an archive library. I've been thinking about a
proposal to mark any object file that contains a used symbol, have ar
recognize that mark and add the ".FORCE" symbol to the archive symbol
table for that object, then have the linker recognize the ".FORCE"
symbol and load the member unconditionally.


Re: PATCH: 2 stage BFD linker for LTO plugin

2010-12-06 Thread Cary Coutant
>>> I see no particular reason why that should be the case.  The issues are
>>> conceptually simple.
>>
>> I'd like to a gold implementation which works on all known cases.

You'd like to what?

> BTW, gold LTO plugin miscompiled 416.gamess in SPEC CPU 2006:
>
> http://www.sourceware.org/bugzilla/show_bug.cgi?id=12244
>
> BFD linker plugin is OK.

I can't tell for sure from either bug report, but is this the -lm
problem discussed earlier in this thread? The gcc driver needs to add
-lm as a pass-through library when -lm is on the command line.

-cary


Re: Update LTO plugin interface

2010-12-03 Thread Cary Coutant
>>> Another way to do this would be to put a marker in the command line
>>> that identifies where those libraries begin, and the linker could just
>>> go back and rescan those libraries if needed, before the final layout
>>> of the endcaps.
>>
>> I like that idea in general, but the difficulty is knowing where to put
>> the marker.  E.g., the user is going to specify -lm, and we are going to
>> need to rescan it.  If the user writes -lm -lmylib1 -lmylib2 we want to
>> rescan -lm but we don't really need to rescan mylib1 and mylib2.
>
> All those complexities make 2 stage linking more attractive.  I
> think I can implement it in GNU linker with the current plugin API.
>
> Linker just needs to remember the command line options, including
>
> --start-group/--end-group
> -as-needed/--no-as-needed
> --copy-dt-needed-entries/--no-copy-dt-needed-entries
>
> in stage 1.
>
> In stage 2, it will place LTO trans files before the first IR file
> claimed by plugin and process the command line options.
>
> --whole-archive may need some special handling.  Archives
> after --whole-archive will be ignored in stage 2.

It seems to me that we just need to add a few more libraries as
pass-through libraries, being careful to add a pass-through option
only for libraries that are already on the command line. How does that
add up to "all those complexities"?

With what you've written here, you've just added to the complexity of
your proposed solution, which makes it a much bigger change --
especially since what you're proposing will require changes in both
linkers. Adding pass-through options is a gcc driver change only.

The pass-through option may be seen as a hack, but I don't think it's
that big of a hack, and it does work. I don't see it as fundamentally
different from adding an option to mark runtime support libraries --
the difference is really just syntax.

In the long term, I'd prefer to see improvements more along the lines
of what I've suggested earlier in this thread -- define a set of
runtime support routines that the backend can generate calls to, and
make those known to the linker so that they can be located during the
first pass. That's the best way to ensure that we have a complete
picture of the whole program for the optimizer.

For now, I think it's sufficient to fix the driver to add the
necessary pass-through options, and maybe gnu ld needs to be fixed to
handle the crtend files correctly. We also should address Jan's
concerns with COMDAT.

-cary


Re: Update LTO plugin interface

2010-12-03 Thread Cary Coutant
> For the crtend files we could add a linker option that makes them
> known as endcaps, and the linker could make sure they get laid out
> last:
>
>   ld ... -lc -lgcc ... --endcap crtend.o crtn.o
>
> That puts the special knowledge about those files back in the gcc driver.

I should have remembered that I already dealt with this problem long
ago -- gold defers the layout of all real objects that follow the
first claimed IR file until after the replacement files have been laid
out. With respect to physical layout of the sections, this effectively
makes the link order equivalent to putting the replacement files in
the place of the first IR file. No "endcap" option is necessary.

-cary


Re: Update LTO plugin interface

2010-12-02 Thread Cary Coutant
>> I wouldn't expect the compiler to introduce a call to anything in libm
>> when -lm isn't already specified on the command line (that would break
>> even without LTO).
>
> Right--that would break without LTO, but it doesn't follow that we
> should make it work with LTO.

That's my point, too.

>> With what I'm suggesting, we'll only resolve libcalls to libraries
>> that were originally specified on the command line.
>
> OK.  Another concern I have with this is that there is no canonical
> location across all systems for some of these functions.  In general -lc
> and -lm will suffice, but on some specific systems they may not.  And
> there are functions which are in libc on some systems and in libm on
> others, such as frexp.  So if we have a mapping from function to
> library, I'm afraid we may get caught up in system dependencies which
> are really kind of irrelevant.  What really matters, I think, is not the
> specific library per function; it's the set of libraries we need to
> rescan.

My suggestion was to have the plugin give the linker a list of
libcalls, and the linker would note where it found each symbol during
its normal library scanning. If we need one of those symbols later and
haven't already linked it in, we'll know where to go find it (we won't
even have to rescan the library). We don't need to have a canonical
list of libraries or locations -- we just go with what was given on
the command line.

>> Another way to do this would be to put a marker in the command line
>> that identifies where those libraries begin, and the linker could just
>> go back and rescan those libraries if needed, before the final layout
>> of the endcaps.
>
> I like that idea in general, but the difficulty is knowing where to put
> the marker.  E.g., the user is going to specify -lm, and we are going to
> need to rescan it.  If the user writes -lm -lmylib1 -lmylib2 we want to
> rescan -lm but we don't really need to rescan mylib1 and mylib2.

Right. The gcc driver would have to have some concept of which
libraries might contain libcalls, and add brackets around them on the
linker command line to mark them as runtime support libraries, whether
they're added by the driver itself, or passed on the gcc command line
by the user.

-cary


Re: Update LTO plugin interface

2010-12-02 Thread Cary Coutant
> This isn't quite what should happen, though.  If the user does not
> specify -lm on the link line, then we should not add -lm, even if LTO
> somehow introduces a function that is defined in libm.  Automatically
> adding -lm would introduce a surprising dynamic dependency in some
> cases.  The user should be explicit about that.

I wouldn't expect the compiler to introduce a call to anything in libm
when -lm isn't already specified on the command line (that would break
even without LTO).

With what I'm suggesting, we'll only resolve libcalls to libraries
that were originally specified on the command line.

> That is, it really doesn't matter which library libcalls come from.  All
> we need to know is which libraries might satisfy new libcalls.
>
> This is what leads me in the direction of copying certain libraries
> mentioned on the command line to follow the LTO objects, much as
> -pass-through does.  I think we just need to make that a little bit more
> automatic.

Another way to do this would be to put a marker in the command line
that identifies where those libraries begin, and the linker could just
go back and rescan those libraries if needed, before the final layout
of the endcaps.

-cary


Re: Update LTO plugin interface

2010-12-02 Thread Cary Coutant
>> For the crtend files we could add a linker option that makes them
>> known as endcaps, and the linker could make sure they get laid out
>> last:
>>
>>    ld ... -lc -lgcc ... --endcap crtend.o crtn.o
>>
>> That puts the special knowledge about those files back in the gcc driver.
>
>  Hmm, yes.  It doesn't work to just pass-through crtn.o, because ...
>
>> Executing on host: /home/davek/gcc/obj-gold2/gcc/xgcc 
>> -B/home/davek/gcc/obj-gold2/gcc/ c_lto_20100722-1_0.o  -O0 -flto 
>> -flto-partition=none  -fuse-linker-plugin      -o gcc-dg-lto-20100722-1-01   
>>  (timeout = 300)
>> gold: /home/davek/gcc/obj-gold2/gcc/crtbegin.o:(.text+0x13): error: 
>> undefined reference to '__DTOR_END__'
>> collect2: ld returned 1 exit status
>> compiler exited with status 1
>> output is:
>> gold: /home/davek/gcc/obj-gold2/gcc/crtbegin.o:(.text+0x13): error: 
>> undefined reference to '__DTOR_END__'
>> collect2: ld returned 1 exit status
>>
>
> ... it's needed by the first pass of symbol resolution.

Yeah, that's why I originally suggested having the plugin claim it, so
that the plugin could define the symbols when they need to be defined.

-cary


Re: Update LTO plugin interface

2010-12-02 Thread Cary Coutant
> For each libcall, we need to decorate
>
> 1. Which library it comes from. It is OS/target dependent.
> 2. The dynamic and static library names.  In most cases,  they
> are the same.  For glibc, they are different.

Is there a relatively painless way to enumerate all the possible
libcalls? We could add a new plugin api and have the LTO plugin
register those symbols up front. The linker would make a note of where
each symbol is defined, and then automatically go back and add any
objects needed to resolve any references introduced by the
optimization pass.

For the crtend files we could add a linker option that makes them
known as endcaps, and the linker could make sure they get laid out
last:

   ld ... -lc -lgcc ... --endcap crtend.o crtn.o

That puts the special knowledge about those files back in the gcc driver.

-cary


Re: Update LTO plugin interface

2010-12-02 Thread Cary Coutant
> I propose that we add a new linker option: --plugin-callback.  At each
> point where this option appears on the command line, the linker will
> call a new plugin callback entry point.  The LTO plugin will replace the
> all_symbols_read callback with this one.  We will have the gcc driver
> run the linker more or less like this:
>
> object-files -lc -lgcc --plugin-callback -lc -lgcc crtend.o crtn.o

I'm not sure how this is any better than the pass-through option we're
already using. That just has the plugin re-inject those libraries that
you have placed after the --plugin-callback option. The crtend.o and
crtn.o files could be handled by having the plugin claim them and
re-inject them at the end.

For every new routine that the gcc backend generates a new call to, it
ought to know which library that routine is defined in, and should be
able to add that library after the generated object(s) during the
all-symbols-read callback. We really don't want to support arbitrary
interposition at that point, because a user-supplied replacement might
invalidate some assumptions that were made during optimization.

-cary


Re: Update LTO plugin interface

2010-12-02 Thread Cary Coutant
>  I'm wondering if the linker shouldn't just gather the plugin-contributed
> object files, substitute them into appropriate places on the original
> command-line, and re-exec itself.

That's not effectively different from the collect2 approach that we
had before the linker plugin interface. The only thing collect2
couldn't handle was IR files inside archives, and that wouldn't have
required much additional effort to add.

The benefit of the plugin approach is that you don't have to link twice.

-cary


Re: Update LTO plugin interface

2010-12-01 Thread Cary Coutant
>> I'm also not sure what you mean by "resolved IRONLY except that it is
>> externally visible to the dynamic linker." If we're building a shared
>> library, and the symbol is exported, it's not going to be IRONLY, and
>> I don't see how it would be valid to optimize it out. If we're
>
> Well, the typical COMDAT symbols (ignoring side cases) needs to be put into
> binary/library only if they are actually used, as all the other DSOs will 
> define
> them too if they are used there.
> So it is valid to optimize out COMDAT after you optimized out all its uses. 
> This
> commonly happens at linktime.

Ahh, OK. I was worried about those side cases where sometimes a pure
reference is emitted. From a linker point of view, that's something
that theoretically could happen, although it may be the case that we
don't actually have to support it. If we had a resolution like
PREVAILING_DEF_IRONLY_BUT_EXPORTED (preferably something shorter than
that), I think that would give the compiler the information it needs.
Is that pretty much what your RESOLVED_IRDYNAMIC was intended to mean?

Another thing that I don't remember offhand whether I got right or not
in gold is that if a COMDAT group is defined in IR and non-IR files,
we want to choose one of the IR files as the instance to keep. I'll
have to check.

-cary


Re: Update LTO plugin interface

2010-12-01 Thread Cary Coutant
>> The only aspect of link
>> order that isn't maintained is the physical order of the sections in
>> memory.
>
> That is exactly the problem my proposal tries to address.

Really? That's not at all what PR 12248 is about. The physical order
of the sections (meaning the order of contributions within each output
section) -- in the absence of any linker scripts -- should be
irrelevant. With linker scripts, or any other form of layout control,
the link order is decoupled from the layout anyway.

> __udivdi3 is just an example.  It can also happen to memcpy, or
> any library calls generated by GCC. I am enclosing a testcase for memcpy.

Regardless, if the compiler backend introduces a call to a runtime
support routine, it's expecting to bind to a specific routine in its
runtime support library. Anything else is unsupported. For gcc or
libgcc hackers, if you *really* need the interposed routine, it's
simple enough to link the .o instead of the .a, or use
--whole-archive.

Think about it -- any failure to bind to an interposed copy of memcpy
(or any other library call generated by gcc) is indistinguishable from
the compiler choosing to generate the code inline.

-cary


Re: Update LTO plugin interface

2010-12-01 Thread Cary Coutant
>> That is what "Discard all previous inputs" in stage 2 linking is for.
>
> But what does that mean?  Are you saying that the linker interface to
> the plugin should change to work that way?  If we do that, then we
> should change other aspects of the plugin interface as well.  It could
> probably become quite a bit simpler.
>
> The only reason we would ever need to do a complete relink is if the LTO
> plugin can introduce arbitrary new symbol references.  Is that ever
> possible?  If it is, we need to rethink the whole approach.  If the LTO
> plugin can introduce arbitrary new symbol references, that means that
> LTO plugin can cause arbitrary objects to be pulled in from archives.
> And that means that if we only run the plugin once, we are losing
> possible optimizations, because the plugin will never those new objects.
>
> My suspicion is that the LTO plugin can only introduce a small bounded
> set of new symbol references, namely those which we assume can be
> satisified from -lc or -lgcc.  Is that true?

Exactly. The plugin API was designed for this model -- if you want to
start the link all over again, you may as well stick with the collect2
approach and enhance it to deal with archives of IR files.

The plugin API, as implemented in gold (not sure about gnu ld), does
maintain the original order of input files as far as symbol binding is
concerned. When IR files are claimed, the plugin provides the list of
symbols defined and referenced, and the linker builds the symbol table
as if those files were linked in at that particular spot in the
command line. When the compiler provides real definitions of those
symbols later, the real definitions simply replace the "placeholders"
that were left in the linker's symbol table. The only aspect of link
order that isn't maintained is the physical order of the sections in
memory.

As Ian noted, if the compiler introduces new references that weren't
there before, the new references must be from a limited set of
libcalls that the backend can introduce, and those should all be
resolved with an extra pass through -lc or -lgcc. That's not exactly
pretty, but I don't see how it destroys the notion of link order --
the only way those new symbols could have been resolved differently is
if a user library interposed definitions for the libcall, and those
certainly can't be what the compiler intended to bind to. In PR 12248,
I think it's questionable to claim that the compiler-introduced call
to __udivdi3 should not resolve to the version in libgcc. Sure, I
understand it's useful for library developers while debugging and
testing, but an ordinary user certainly can't count on his own
definition of that routine to get called -- the compiler might
generate the division inline, or call a different specialized version.
All of these routines are outside the user's namespace, and we should
be able to optimize without regard for what the user's libraries might
contain.

An improvement could be for the claim file handler to determine what
libcalls might be introduced and add them to the list of referenced
symbols so that the linker can bring in the definitions in the
original pass through the input files -- any that end up not being
referenced can be garbage collected. Alternatively, we could do a
whole-archive link of the library that contains the libcalls, again
discarding unreferenced routines via garbage collection. Neither of
these require a change to the API.

-cary


Re: Update LTO plugin interface

2010-12-01 Thread Cary Coutant
> If we get into extending linker plugin interface, it would be great if we 
> would
> do somehting about COMDAT.  We now have RESOLVED and RESOLVED_IRONLY, while 
> the
> problem is that all non-hidden COMDAT symbols get RESOLVED that pretty much
> fixes them in the output library.
>
> I would propose adding RESOLVED_IRDYNAMIC for cases where symbol was resolved
> IRONLY except that it is externally visible to dynamic linker.  We can then 
> allow
> compiler to optimize this symbol out (same way as IRONLY) if it knows it may 
> or
> may not be exported - i.e. from COMDAT flag or via -fwhole-program.

(This is off the main topic...)

Actually, we have PREVAILING_DEF and PREVAILING_DEF_IRONLY, plus
RESOLVED_IR, RESOLVED_EXEC, and RESOLVED_DYN. If the symbol was
resolved elsewhere, we don't have any way to say whether it was IRONLY
or not, and that's a problem for common symbols, because there really
is no prevailing def -- the linker just allocates the space itself.
Currently, gold picks one of the common symbols and calls it the
prevailing def, but the one it picks might not actually be the largest
one. I'd prefer to add something like COMMON and COMMON_IRONLY as
possible resolutions.

I'm not sure if you're talking about that, or about real COMDAT
groups. As far as gold is concerned, it picks one COMDAT group and
throws the rest of them away, but for the one it picks, you'll get
either PREVAILING_DEF or PREVAILING_DEF_IRONLY. That should tell the
compiler what it needs to know.

I'm also not sure what you mean by "resolved IRONLY except that it is
externally visible to the dynamic linker." If we're building a shared
library, and the symbol is exported, it's not going to be IRONLY, and
I don't see how it would be valid to optimize it out. If we're
building an executable with --export-dynamic, same thing.

-cary


Re: Issue with LTO/-fwhole-program

2010-06-11 Thread Cary Coutant
> But if I understand correctly, mixed LTO/non-LTO + whole-program is
> almost never correct. So we should really emit a warning for this
> specific combination. I think making this mistake would be quite easy
> but hard to debug.

It's not only correct, it's essential. "Whole Program" doesn't mean
that the compiler has to see all the IR for the whole program.
Instead, the compiler has visibility over the whole program in the
sense that it knows what variables and functions are referenced and
defined by the non-LTO code. Linking your program with non-LTO code is
inescapable unless we start shipping language and system libraries as
archives of objects compiled with -flto (and remove all assembly code
from such libraries).

The plugin interface was designed to provide this essential
information to the compiler about all the non-LTO code; until Gnu ld
implements this or the collect2 interface provides something similar,
you're simply working with an incomplete implementation, and you'll
have to live with the limitations.

-cary


Re: externally_visible and resoultion file

2010-06-09 Thread Cary Coutant
>> Yes, this is also what I saw without plugin. I just wonder why "v"
>> is linked with plugin if resolution file is not used to eliminate need
>> of externally_visible attribute here.
>
> Probably because of the same linker-plugin bug that causes bar
> to be resolved.

Just to make sure I understand the problem:

- The IR file for a.c contains definitions for v and bar.
- The linker identifies that both symbols are referenced from outside
the LTO world (PREVAILING_DEF rather than PREVAILING_DEF_IRONLY), but
gcc isn't (yet) reading that info from the resolution file.
- WPA eliminates bar() and makes v static in the replacement object file.
- There are still references to those symbols in b.o, which was
compiled outside LTO.
- The linker should be complaining about undefined symbols in both
cases, but isn't (perhaps because it's still seeing defs left over
from the IR files). The symbol bar has a value of 0, while the
reference to v seems to have the right address.

Is that about right? What you're expecting is a link-time error
reporting both bar and v as undefined, right?

-cary


Re: GCC 4.5 Status Report (2009-09-19)

2009-09-21 Thread Cary Coutant
> extensibility, this isn't a problem for these extensions, as older
> DWARF readers will simply ignore the location expressions that use the
> extensions -- which produces the same behavior as DWARF-2 without
> those extensions.

I said "will simply ignore" when I guess I should have said "should
simply ignore". Obviously, the whole point of the -gstrict-dwarf
option is to deal with certain tools that don't simply ignore the
extensions.

-cary


Re: GCC 4.5 Status Report (2009-09-19)

2009-09-21 Thread Cary Coutant
>   Are you saying that current gcc trunk should require -gdwarf-4
> to issue dwarf4 commands? I ask because r151815...
>
> http://gcc.gnu.org/ml/gcc-patches/2009-09/msg00220.html
>
> causes dwarf4 by default. Is there a consistent policy on this?
> Currently in PR41405, there is a proposal for a -gstrict-dwarf
> option which I guess should be expanded to cover your patch if
> gcc 4.5 will be defaulting to -gdwarf-4 being enabled.

That patch actually enables the use of certain DWARF extensions
(DW_OP_stack_value and DW_OP_implicit_value) from the DWARF-4 spec
while still generating nominal DWARF-2. Since DWARF was designed for
extensibility, this isn't a problem for these extensions, as older
DWARF readers will simply ignore the location expressions that use the
extensions -- which produces the same behavior as DWARF-2 without
those extensions. Because the behavior with an older consumer is no
worse than the behavior without the extension, it's perfectly
reasonable to use these extensions without any gating option.

There are a couple of new things in the DWARF-4 spec that are not
completely backward compatible with DWARF-2, but none of those are
implemented in gcc yet. In fact, my dwarf4 branch still generates
nominal DWARF-2 output, while using the extensions from DWARF-4 to
allow the separation of type info into separate COMDAT sections. I
gate the new behavior on the -gdwarf-4 option, however, since the use
of this extension with an older consumer would represent a loss of
functionality -- the older consumer would not see any of the type
information that was placed in COMDAT sections. Thus, my changes won't
be enabled by default, so they won't need to be affected by the
-gstrict-dwarf option.

I think gcc with -gdwarf-4 can (and should) continue to mark the DWARF
output as version 2 until it starts taking advantage of some of the
new FORMs (which old consumers will not know how to skip and ignore),
the new line number table header format, and the new frame section
format. And it shouldn't start taking advantage of those things until
gdb support for those features is available.

-cary


Re: GCC 4.5 Status Report (2009-09-19)

2009-09-21 Thread Cary Coutant
>>   So aren't we now likely to lose the first few days of what little remains 
>> of
>> stage 1 waiting for trunk to start working again, then have a mad rush of
>> people falling all over each other to get their new features in in the last
>> couple of days?  One of which will inevitably break trunk again and block all
>> the others and then stage 1 will be over and it'll all be too late?
>
> I am not aware of any big patches that are still pending.  Coming up
> with new yet unknown things now wouldn't be a good timing anyway.

I was hoping to get the dwarf4 branch merged into trunk during stage
1. While it's not a small patch, it's also not really that intrusive
in that it consists mostly of new code that runs only with the
-gdwarf-4 option. I've been testing it on a lot of big code bases for
the last few months, and haven't found any new bugs for a more than a
month now, so I think it's ready.

I'll work on merging top-of-trunk into the branch early this week and
then send a patch to merge back into the trunk.

-cary


Re: Redirecting I/O

2009-04-10 Thread Cary Coutant
(This question probably should be on gcc-help instead of this list.)

> I'm trying to redirect I/O in my C++ application and am having some
> difficulty. I am trying to use cout or a file for output based on some
> condition. cout is an ostream object and file is an ofstream object.
> The types are incompatible, as in:
>
> bool condition;
> ofstream x;
> ofstream out = (condition)?cout: x;   // won't work because of cout

Try:

ostream& out = condition ? cout : x;

> In addition I would like to redirect an ofstream object to be the same as
> out, as in;
>
> void fn() { object = out; } // won't work because '=' is private.

Again, try declaring object as "ostream&".

-cary


Re: generating functions and eh region

2009-04-02 Thread Cary Coutant
> With SEH you can catch that kind of errors and that's why it's so
> interesting in embedded world

That's also why SEH is a major pain for optimization. The compiler
would have to identify every instruction that may trigger an
exception, and either treat that instruction as a scheduling boundary
or create a new landing pad for the instruction (where compensation
code can be placed to adjust for the effects of instructions moved up
or down). The former inhibits a lot of optimization, while the latter
blows up the size of the EH tables and the corresponding landing pad
code.

On top of that, the filters can return a code that tells the EH
mechanism to resume execution at the original exception point as if
nothing happened. Just trying to understand all the implications of
that makes my head hurt.

-cary


Re: Mixing languages in the same TU and dwarf2out_frame_finish

2009-01-07 Thread Cary Coutant
> However, on LTO, we may have several personality functions in the same
> object file.  So, it's not clear what should we do here. Should we
> call dwarf2out_frame_finish() for each of the personalities that we
> saw in that file?

I don't think so. We're calling dwarf2out_frame_finish() at the end of
the TU, not at the end of each function, and it generates the CFI
information for all functions in the TU at once. I think you're going
to need to do the following:

- In dwarf2out_begin_prologue, store the value of
get_personality_function (current_function_decl) in the fde_table
entry for the current function.

- In output_call_frame_info, scan the fde_table, and generate a
separate CIE record for each unique personality function -- this is
where the pointer to the personality routine is recorded. Then when
generating the FDE records, set the CIE_pointer field to point to the
appropriate CIE record. Currently, all FDE records share a single CIE
record.

On the other hand, if you can arrange for dwarf2out_do_cfi_asm() to
return true, the assembler should take care of it already.

-cary


[lto] [RFC] Design proposal for debug support in LTO

2008-12-23 Thread Cary Coutant
LTO currently doesn't support the generation of debug info very well,
as we discard much of the front-end information that is needed for
debug info before streaming the IR to the intermediate file. I've
written up the following proposal to fix this, and have also posted it
on the gcc wiki:

  http://gcc.gnu.org/wiki/LTO_Debug

A goal of this design is that it can ultimately enable the use of the
free_lang_specifics pass in all compilations, even when not doing LTO.

-cary


Debug Support for LTO


Background

With Link-Time Optimization (LTO) enabled, the compiler stores an
intermediate representation (IR) of the code in the object file rather
than compiled object code (the LGEN phase). The IR is then combined
with IR from other object files at link time, and the actual
compilation then takes place (the LTRANS phase). This approach
naturally divides the compilation process between its front-end and
its back-end, and the design of the IR is such that it contains only
the language-independent information that would normally be needed by
the back-end.

In gcc, the symbolic debug information is generated from the tree
representation late in the compilation process, and it assumes that
all of the original information is still present in the trees. Much of
the information that has been discarded in the process of storing the
IR at the end of the LGEN phase and reading it back in at the
beginning of the LTRANS phase is needed in order to produce the
symbolic debug information, even though it is not otherwise needed by
the back-end of the compiler.

This proposal presents a design for augmenting the IR with the
additional information necessary for the generation of symbolic debug
information during the LTRANS phase.


Alternatives

One simple approach is to preserve all of the front-end information in
the IR when streaming it to the object files, so that it can all be
reconstructed for the LTRANS phase. This approach would significantly
increase the size of the IR, so for practical purposes, the compiler
would need to arrange to preserve the additional information only when
the -g option is used, and the IR for debug and non-debug compilations
would differ. This could result in subtle bugs or differences in code
generation. In addition, some of the front-end information is
language-specific, and since the LTRANS phase may be combining IR from
more than one language, langhooks still need to be removed from the
back-end.

Another approach is to generate the debug information earlier -- in
the front-end. This approach would significantly alter the structure
of the compiler and would be a major undertaking. In addition, many
back-end transformations affect the debug information, so the back-end
would then need an infrastructure for decoding the debug information,
modifying it, then re-encoding it. Such an approach might be practical
for a single debug format, but in order to support the several formats
that gcc currently supports, it would also become a major undertaking.

An improvement would be to partition the debug information in such a
way that the information generated early is not subject to back-end
transformations, and that the information preserved for the LTRANS
phase is sufficient for generating the remainder of the debug
information. In a sense, the design presented here does this, but the
early generation does not commit to a specific debug format. Instead,
it stores the early information in a higher-level data structure that
is written separately to the object file.


Overview

Near the end of the LGEN phase of the compilation, gcc runs the
free_lang_specifics pass, which removes all of the information from
the trees that will not be needed by the LTRANS phase. During this
pass, just before discarding the information, if debug information has
been requested, we will call a new set of debug-related APIs to record
debug-related information that is about to be discarded.

The debug information for a given tree node will be stored in one of
two separate global hash tables (one for decls, one for types),
indexed by the UID, as a list of properties. Each separate fact that
we need to record will be stored as a property, represented as a (key,
value) pair. The property keys are simple small integers that identify
the kind of property being recorded (e.g., Context, Base Classes,
Member Methods, ...). The property values may be references to another
tree, a list of trees, or simple integer or string values.

When the IR is streamed out to the object file, we will stream the
contents of the debug hash table out to a new section, .gnu.lto_debug.
Some of the properties to be streamed out will refer to other trees,
some of which may not have been streamed out to the main part of the
IR. For references to trees that have already been streamed, we will
simply use the "pickle" that was already generated for those trees.
For references to trees that were not already streamed out, we will
stream those trees out to a seco

Re: gdb test suite failure on i386 and x86_64 in gdb.base/break.exp

2008-09-13 Thread Cary Coutant
> This is PR 36690 which has various bits of analysis.

Thanks! I did search for this problem, but I guess I didn't use the
right terms, and it didn't turn up this bug report. Looks like if I
had done another search after analyzing the problem, it probably would
have turned up.

-cary


gdb test suite failure on i386 and x86_64 in gdb.base/break.exp

2008-09-12 Thread Cary Coutant
There are a couple of failures in the gdb test suite on i386 and
x86_64 with gcc 4.3.0 or newer. The tests gdb.base/break.exp and
gdb.base/sepdebug.exp are failing with a function begins with a while
loop. The into_cfg_layout_mode pass was added in 4.3.0 (see
http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00687.html), and that
pass removes the instruction that jumps from the end of the first
basic block to the bottom of the while loop. The outof_cfg_layout_mode
pass reintroduces the branch (in force_nonfallthru_and_redirect) once
the basic blocks have been laid out, but the source location
information has been lost by that point.

When you try to set a breakpoint at the beginning of the function, gdb
looks for the second row in the line table (it skips the first to get
past the prologue), and sets the breakpoint there. Because of the
missing locator on the jump, the second row is now the interior of the
while loop, and the breakpoint is in the wrong place.

Here's a reduced test case:

void foo(int a)
{
   while (a) { // line 3
 a--;  // line 4
   }
}

If you compile this (for x86_64) with a top-of-trunk gcc with -S -g,
you can see that the jmp to .L2 has no .loc directive in front of it,
and the first .loc directive is now the one for the body of the while
loop:

.file 1 "foo.cc"
.loc 1 1 0
pushq   %rbp
.LCFI0:
movq%rsp, %rbp
.LCFI1:
movl%edi, -4(%rbp)
jmp .L2
.L3:
.loc 1 4 0
subl$1, -4(%rbp)
.L2:
.loc 1 3 0
cmpl$0, -4(%rbp)
jne .L3
.loc 1 6 0
leave
ret

For comparison, here's the output from gcc 4.2.1:

.file 1 "foo.cc"
.loc 1 1 0
pushq   %rbp
.LCFI0:
movq%rsp, %rbp
.LCFI1:
movl%edi, -4(%rbp)
.loc 1 3 0
jmp .L2
.L3:
.loc 1 4 0
subl$1, -4(%rbp)
.L2:
.loc 1 3 0
cmpl$0, -4(%rbp)
jne .L3
.loc 1 6 0
leave
ret

I've tried changing force_nonfallthru_and_redirect (in cfgrtl.c) to
use to e->goto_locus field as the location for the reintroduced jump,
but that seems to mark the jump with line #6 (goto_locus might not
even be valid yet at this point, I'm told, and I'm not even sure that
a locus can be used where an INSN_LOCATOR is expected -- the
location_from_locus macro was removed). I've also tried looking
through the target bb's instruction list to find the first instruction
with an INSN_LOCATOR and using that for the locator of the jump -- it
fixed this problem, but broke other tests because now a forward branch
in other contexts (if-then-else, for example) gets the line number of
its target, and gdb will now use that branch as the breakpoint
location for that line number.

I'd argue that gcc really ought to be flagging the end of the prologue
-- there's a debug hook for that, and it's used by most of the debug
formats, but not by DWARF. The DWARF spec was extended (in version 3)
to allow the line number table to indicate the end of prologue, so gcc
(and gas) ought to be updated to record it in the line table, and gdb
ought to be taught to use that in lieu of looking for the second row
in the line table. Until all that happens, though, I think a quicker
fix is necessary.

Any suggestions?

-cary


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-07-06 Thread Cary Coutant
> On the "claim file", can you also pass the "file" size in the case it
> is inside an archive?

Good idea. Will do.

-cary


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-07-06 Thread Cary Coutant
> * "End of first pass" may be a little gold specific.  Perhaps it
>  should be called something like "after all input files have been
>  seen."

Sure. It seems to me that the pass 1, middle, pass 2 breakdown is
pretty common for linkers, though perhaps not universal. I'll find a
better name (I wasn't really happy with this name anyway.)

> * The linker does normally copy unrecognized sections with the
>  SHF_ALLOC bit clear to the output file.  It doesn't allocate address
>  space for them, but it does copy them.  I think this follows the ELF
>  ABI.  I don't know of any generic way to direct the linker to not
>  copy sections to the output file.

OK, as Daniel suggested, we could have the compiler set the
SHF_EXCLUDE bit as well for those sections, and add support for that
in gold (if it's not already there).

> * Do we need to worry about the type of the symbol in the "add
>  symbols" interface?  For example, what about a TLS symbol?  Also,
>  when the GNU linker sees a common symbol in a regular object and a
>  symbol with the same name in a shared library, the action depends on
>  the type of the symbol in the shared library.  For STT_OBJECT, the
>  common symbol becomes an undefined reference to the shared library.
>  For STT_FUNCTION, it does not.  Gold does not currently behave this
>  way--the common symbol always overrides.  But in any case, there is
>  some precedent for worrying about symbol type.

I don't think so, but I'll take a closer look. I think we don't really
need to worry about the type of the symbol until we get the real .o
file.

> * The command line arguments should explicitly be placed in the
>  transfer vector in the order in which they appear on the command
>  line.

OK.

> * Type names ending in "_t" are reserved by POSIX.  We shouldn't use
>  them (I'm looking at ld_plugin_status_t).

Oops, forgot about that.

> * GOLD_VERSION should perhaps say something about the format of the
>  string.

OK. What would be reasonable to say here? Just a string of the form
"n.m"? Is it reasonable to require that later versions are lexically
greater than earlier versions (e.g., can't have "1.9" then "1.10"), or
is it OK to require parsing the string to do comparisons?

> * I guess that having the message hook take a va_list is most
>  flexible, but it is inconvenient for typical uses.  Taking a
>  variable number of arguments would be more convenient.  Or it might
>  be reasonable to just take a string, and push formatting to the
>  plugin.

Yeah, I almost put "..." there instead. Probably better than va_list.

-cary


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-07-03 Thread Cary Coutant
> We've started working on the driver and WPA components for whopr.
> These are some of our initial thoughts and implementation strategy.  I
> have linked these to the WHOPR page as well.  I'm hoping we can
> discuss these at the Summit BoF, so I'm posting them now to start the
> discussion.

I've updated the WHOPR Driver wiki page with our latest thoughts on
the plug-in interface:

 http://gcc.gnu.org/wiki/whopr/driver

-cary


Re: How to reserve an Elf e_machine value

2008-06-06 Thread Cary Coutant
Let me second H.J.'s suggestion to post your request at

http://groups.google.com/group/generic-abi

In the absence of any SCO presence, that group now serves as the
closest thing we have to a standards forum for ELF and the gABI.

-cary


Re: ln -r and cherry picking.

2008-06-06 Thread Cary Coutant
> I think that one of the goals here is to not make that too dependent on elf.
>  For instance, we are in the process of getting rid of all of the dwarf.
>  After maddox does that, our only dependence on elf will be as a container
> to hold all of the sections.
> Given that gcc is not always an elf compiler, it really is a lot easier to
> invent your own wheel for something like this rather that using elf's wheel
> for the first target and then having to figure out how make someone else
> wheel fit the for the rest of the targets.

This is basic functionality that *every* object file format supports.
I don't think using a symbol and a relocation is going to tie you down
to ELF -- no more so than the idea of using sections to store your
data in.

I also think it's simpler and more deterministic than using a hash.

-cary


Re: ln -r and cherry picking.

2008-06-06 Thread Cary Coutant
>  2) LTO sections need to be able to find "their index" of decls and
>  types.  By "their index" I mean the index that each section used to
>  reference the decls and types when the section was generated.

Can't you just put an ELF symbol (can be an unnamed local -- could
even just be a section symbol) on the index section, then add a
pointer in the IR section with a relocation to that symbol? This is
basically how DWARF .debug_info sections point to the abbrev table in
the .debug_abbrev sections.

-cary