Re: [rust-dev] Rethinking Linking in Rust

Brian Anderson Mon, 18 Nov 2013 17:14:29 -0800

On 11/15/2013 12:09 AM, Alex Crichton wrote:

I've been thinking about static linking recently, along with a little bit of
linking in general, and I wanted to see what others thought.


# The Goal

Primarily, I believe that if desired, rustc should be able to generate an
executable or dynamic library with no dependence on any rust libraries. This
includes things like librustrt and libextra. Rust shouldn't be striving to lift
dependence on system libraries, that'll come at later times if need be.

Additionally, rustc should be able to generate libfoo.a where libfoo.a has no
dependence on any rust libraries. This library can then be statically linked to
another application.

# Intermediate static libraries

I personally know of no way to create a static library from a dynamic one, so to
achieve this we would need to distribute libstd and libextra in some form that
is not a shared library. This problem not only applies to libstd, but also to
any rust library which wants to be statically linked.

The first natural conclusion for for an intermediate format would be a .a file
itself. Why not distribute libstd.a along with libstd.so. After all, a .a is
only an archive which in our case would contain one .o file. In thinking about
this though, I don't think that this is the best format. The main idea of
providing intermediate .a files is to allow linkage to them via the normal
system linker. To be usable, this would mean that all .a files rust generates
would have to have their own statically linked version of libstd or otherwise
everyone will have to find where libstd is guess the name and hash attached to
it. This is infeasible for arbitrary libraries which could have arbitrarily many
dependencies.

# Native Libraries

One part of linking which rust cannot forget is native libraries. Right now,
native libraries are always linked against when compiling a local crate, but no
native library dependencies are propagated among crates.

Due to the nature of a static library and what I assume is the file format
itself, a static rust library cannot link to its dependent dynamic libraries. We
can, however, resolve all native static dependencies at compile time.

# A Scheme for Linking

With the above knowledge, I would propose the following linkage model for rust.

There are four types of files that the rust compiler will generate:

1. An executable
2. A dynamic library (.so, .dylib, .dll)
3. A "rust" static library (.rlib)
4. A regular static library (.a, .lib)

The "rust language" would ship with dynamic library files as well as .rlib
files. There would be no .a files in the distribution.

A rust static library would be a format defined by rust that is not available
for or intended for external use. It is meant to be beneficial to the rust
compiler and that's it. It just so happens that their first incarnation would be
created similarly to `cp foo.o foo.rlib`.

In addition to these changes, the linkage attributes would change to be as
follows:

* #[link_args] becomes gated behind a feature flag. I believe that this is still
   a very useful ability to pass arbitrary flags to the linker, but this is 
*not*
   a sanctioned way of doing so at all because of how platform specific it is

* #[link(...)] becomes the new method of specifying linkage semantics on extern
   blocks, and it may be used similarly to link_args today

I'd kind of like for this to be available at the crate level too sincemost libraries don't use OS X two-level namespaces and it's moreconvient to me to just put all the linkage at the top of the crate. Ofcourse this conflicts with the `link` attribute of crates, which I thinkis poorly named anyway.


   * #[link(name = "foo")] specifies that this crate links to native library
     `foo`
   * #[link(once)] implies that the native library is a static library, hence it
     *must* be linked against in the current compilation, regardless of the
     output format

   Omission of `link(once)` assumes that the library is available at all
   destinations, and it may not be linked against in the current compilation
   unit.

I don't really understand what 'once' implies in `link(once)` and how itrelates to statics. If a static library *must* be linked, then dynamiclibraries may not be linked? Why is that? If 'once' implies 'static',can we just say 'link(static)'? I assume some argument propagation isgoing to come into play here ...

Will also need to accomodate some other common features like, e.g.`link(framework = "foo")` or something for OS X frameworks.


## The Linkage Step

To see how this affects how artifacts are created, I'd like to go into detail
about how each of the four output artifacts all interact with one another by
describing the linkage phase of each output. For each of these, remember that
the compiler's output is one .o file for each crate. Also remember that all rust
libraries will always link to all upstream rust libraries.

### Linking Executables and Dynamic Libraries

These two cases are very similar because they are creating the actual "result
artifact" in terms of a file which will have no more linkage performed on it.
The following components must be linked in to produce the artifact:

* The local .o file
* All local native dependencies
* All upstream rust libraries (dynamic and static)
* All non-once (dynamic) native libraries of upstream static crates. More on
   this later

The result artifact needs to be a fully resolved destination artifact. The point
of this is to have a dynamic dependency on all upstream dynamic libraries, and
all upstream static libraries will have been sucked in to create the target.

### Creating rust static libraries (.rlib files)

As mentioned above, these files are similar to the compiler's .o output. The
only other component which can be considered for inclusion in this .o file is
all native static library dependencies. These are encoded as #[link(once)] in
linkage attributes. The reason for doing this is that it's likely to be common
to have a local static library which is not available in distribution, but is
always available for the build process. Examples for the compiler include
libsundown, libuv, libuv_support, and maybe librustrt.


.rlib files also need the crate metadata.


The .rlib file will be created by using ld's -r flag. This output will then have
all native static dependencies resolved, but remember that no rust dependencies
were part of this linkage process. Whenever this .rlib file is used, all of its
dependencies are encoded in the metadata and they're all sucked in at the end as
well.

What happens when two upstream crates link to the same native staticlibrary? In the final link step they are both going to be linked in, andI presume there's some kind of conflict?


The goal of not pulling in all rust dependencies is to avoid finding a static
copy of libstd in all .rlib files everywhere.

### Creating a system static library (.a or .lib)

The whole point of being able to do this is so that a rust component can be
statically linked into another application. The idea behind this mode of
compilation is to be just as much of a destination artifact as an executable or
dynamic library. The rust compiler will never attempt to link against
rust-generated .a files (it has .rlib files to look for). The .a files are
purely meant for external usage.

Again though, due to the nature of the .a format, we cannot be as comprehensive
in our dependency resolution as we were in the above cases. The first thing to
consider is all inputs to this file:

* The compiler's output .o file
* All local native static libraries
* All upstream rust .rlib files

Note how there is no mention of upstream dynamic library dependencies. Sadly, I
know of encoding those dependencies in this .a output format. I would propose
the compiler printing a warning when this is performed such that when undefined
references are found you at least have a suggestion of what dynamic libraries
you need to link against.

## Static vs Dynamic

This scheme outlines the ability to manage static and dynamic native libraries,
but it would mean that we're going to start introducing static and dynamic rust
libraries in the same location. I would propose that the compiler automatically
favors static linkage over dynamic linkage due to various rust ABI issues. This
default could be switched in the future, but it simply means that if the
compiler finds a .so and a .rlib, it will suck in the .rlib before sucking in
the .so.

## Compiler UI

If we have a scheme like this, we certainly need a method of managing it from
the command line. I would propose dropping all linkage related flags we have
today, and starting over with the following:

* --rlib, --dylib, --staticlib. These three options are stackable, and control
   the output format of the compiler. If nothing is specified, then an 
executable
   is assumed. The reason that these are stackable is becuase it is possible
   to create multiple artifacts from one compilation instead of having to
   recompile

* -Z print-link-args. This is the same as it is today

How does one opt into linking to dynamic libraries? Without some furthermechanism everybody will be linking to the static libstd.


# Conclusion

I originally thought that this would be a proposal for adding static linking,
but this has kinda become more of a makeover of rust's current linkage model. I
believe that this scheme will solve the "static library" problem as well as
still accomodating the dynamic library approach that we have today. I wanted to
get this all down in writing, and I feel like this is certainly concrete enough
to act upon, but before doing so this should definitely be discussed.

What are others' thoughts on this? Is this too complex of a system? Is there a
glaring omission of use cases?


It sounds promising to me.


Hopefully soon we can generate a rust library with no dynamic rust dependencies!

---

As a side node, after writing all this up, I remembered LTO as an
option for generating libraries. I don't think I know enough about LTO
to be able to say whether it would fit in this system or not, but my
basic understanding is that an LTO library is just "IR in a box". We
could add a --lto output option which has pretty much the same
semantics as the --rlib option, but with a different format. Again
though, I haven't thought how native libraries would fit into that
scenario, but I believe that we could fairly easily accommodate LTO in
a system like this.
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Rethinking Linking in Rust

Reply via email to