Hopefully this is better. I added a new line between each paragraph

On Tue, Nov 26, 2019 at 10:58:41AM +0100, Pierre Neidhardt wrote:
> I think the attachment broke the formatting of the file (there is no
> paragraph break).  Could you resend it?
> 

-- 
Efraim Flashner   <efr...@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
It's easy to think of Rust as a new programming language but it has already 
been around for five years.  Rust has made it past it's 1.0 release and the 
compiler is written in Rust. We even have mrustc to act as a secondary method 
to bootstrap new Rust releases without falling back to downloading precompiled 
tarballs. So how is the state of Rust in Guix today?

Truthfully, Rust in Guix could be better. The developer story for Rust is 
pretty straightforward: write your program, declare your dependencies in a 
Cargo.toml file, and ```cargo foo``` will figure out your dependency chain. 
```cargo build``` will download any missing dependencies, even using a cache 
directory to reduce downloads, and compile the bits of the dependencies that 
are needed.

But what about for distro maintainers?

Obviously we can't download dependencies at build time, they need to be 
packaged ahead of time. So we package those dependencies. But wait, those 
dependencies have dependencies that are needed, and those ones too. It's 
dependencies all the way down, hidden in 5 years of iterative development that 
we're late to the party to, trying to capture snapshots in time where specific 
versions of libraries built using previous generations. All this all the way 
back to the beginning, whenever that is.

Obviously humans are prone to errors, so to work around this while packaging 
Rust crates Guix has effectively two importers for crates, one that will import 
a specific version and list it's dependencies, and one that can take a crate 
and recursively import all the packages that it depends on. Currently some work 
is needed to allow the recursive importer to interpret version numbers, but for 
now it works quite well.

Taking a break from Rust for a moment, let's look at some of the other 
languages that are packaged. Packages written in C/C++, processed with 
autotools or cmake or meson, are the easiest. Dependencies are declared, source 
code is provided, and there's a clear distinction between source code and 
compiled binary; source code is for hacking on, binaries are for executing. The 
closest to a middle ground are libraries which allow programs to use features 
from other programs. In order to use a package, all of its dependencies must be 
packaged and the libraries linked.

Taking a look at the other end we have Javascript. Javascript is source code, 
it's ready to be read and hacked on. Javascript is already ready to be run, 
therefor it must be a binary. Its... both? Javascript libraries leave distro 
maintainers in a difficult position. Building Javascript ends up in the same 
problem as we saw with Rust, recursive dependencies all the way down, iterative 
versions depending on previous ones, and a misty past from whence everything 
sprang forth, which must be recreated in order to bring us back to the present 
day. But there's more difficulty, often even after a 'build' phase has been run 
and tests have been run on Javascript we're left with unchanged code. Except 
now it's no longer source, it's a binary... or something. So just what did we 
build and test?

We can worry about Javascript another time, Rust has a clear boundary between 
source code and binaries.

So how about python? Python is a scripting language and can be run without 
being compiled, but it also can be compiled (pre-interpreted?) to bytecode and 
installed either locally or globally. That leaves us with source code which can 
double as a binary, and a bytecode which is clearly a binary. Given these two 
states, we declare the uncompiled version as source code, ignore that it can be 
run as a script except when testing the code, and we never return to 
second-guess ourselves.

How about Go? Go is another language that defies packaging efforts, primarily 
because build instructions often make use of the HEAD of other git branches, 
not tagged and released versions. That the names of the libraries are long and 
cumbersome is mostly a secondary issue. On the developer side a binary is a 
```go build``` away. Go will download missing source and compile libraries as 
needed. On a packager side the libraries are carefully gathered one by one, 
precompiled, and placed carefully in a directory hierarchy for use in future 
builds. What could be a long build of a program is replaced by an intermediate 
series of packages where libraries are pre-compiled, and at each stage only the 
new code has to be compiled.

For all except the distro maintainer, the similarities are strong between Rust 
and Go. In both cases dependencies are downloaded as part of the build process, 
there's a cache for the downloaded sources and the compiled libraries, and 
build artifacts can be reused between different programs with overlapping 
dependencies. For the distro maintainer many of these similarities are thrown 
out. Dependencies are packaged ahead of time and previously packaged libraries 
is literally a cache. Libraries can be reused for other packages, yes, but for 
Rust they're not.

Why not? If they're already compiled why not reuse them?

Previously we've discussed source code and compiled binaries (or libraries), 
but in Rust there are two types of libraries. There are dynamic libraries, 
packaged as ```libfoo.so```, and there are Rust libraries, packaged as 
```libfoo.rlib``` or ```libfoo-MAGICHASH.rlib```. When a Rust package declares 
a dependency on a Rust library, it doesn't declare a dependency on the whole 
library but rather just on the parts that it needs. This means that we can get 
away with packaging only a portion of the dependent library, or the library 
with only some of its features or its own dependencies. When compiling a final 
binary, a Rust binary doesn't link to an rlib, it takes just the part that it 
needs and incorporates it into the binary. As far as package maintainers are 
concerned, this isn't ideal but it is something we can live with, we already 
have this case with static libraries from other languages. If we were to 
compile the binary manually the command would be ```rustc --binary foo --extern 
bar=/path/to/libbar.rlib``` and we'd continue on. However, when bar depends on 
baz, the similar command, ```rust --library bar --extern 
baz=/path/to/libbaz.rlib``` _doesn't_ link libbaz to libbar. This leaves us in 
a pickle; we know which libraries we need but we're unable to compile them 
individually and build them up iteratively until we reach the binary endgoal.

One of our packaged Rust programs, rust-cbindgen, is used by Icecat. 
Rust-cbindgen declares 8 (TODO: check this number) dependencies. When run 
outside of the build environment ```cargo build``` downloads a total of 58 
(TODO: check this number) packages, compiles them and produces a binary. Our 
recursive importer created more than 300 new packages before it was told to 
stop. Returning to our build process for rust libraries, since we couldn't link 
one rlib to another rlib, we opted to compile one rlib and then place its 
source in the build directory of the next one where it was recompiled. Baz 
would be built, then baz's source would be put in bar's vendor directory where 
baz and bar would be built. After this baz's and bar's sources would be put in 
foo's vendor directory, where all three would be compiled. This sounds like Go, 
except that we're throwing away all the results of our builds each time we 
start a new package.

Since we were just copying the sources from package to package, the simplest 
solution was to consider the Rust dependants as shared sources and not as 
shared libraries. Yes, the same source would be used between multiple programs, 
but each one package already only took the small portion of the shared source 
that it needed so there was no benefit to compiling the entire package ahead of 
time, especially with the mounting recursive dependencies, who's compiled 
libraries were being thrown away anyway.

Rust-cbindgen ships with a Cargo.toml listing 8 dependants. It also ships with 
a Cargo.lock, detailing the 8 dependencies and the bits of other libraries that 
are needed. By packing the sources of the 58 enumerated libraries and placing 
them in the vendor directory where the necessary parts could be compiled we 
ended at the same place we were headed anyway; only the sources were propagated 
from package build to package build, only the source was the relevant part, 
only the source is shared.

Attachment: signature.asc
Description: PGP signature

Reply via email to