Hi,
We had a meeting today that was mostly about cycle time. Integrating a
change on the rust repo currently takes about 2 hours, which is still
far too much. We've tried approaching this from a variety of
perspectives in the past and I feel like possibly I've not conveyed the
cost and work breakdown, or discussed systematic approaches to the
problem yet. I'd like to do so now, as I believe the issues are all
quite solvable and I'd very much like people to shift their energy to
focus more on these matters:
- Top level measurements of cycle time are here:
http://huonw.github.io/isrustfastyet/buildbot/all
- Our worst offenders are the -all and -vg builders. The former does
a full cross-compile bootstrap, thus winds up building 9 copies of
the compiler and libraries. The latter runs the testsuite under
valgrind. I've (temporarily) turned these off, in order to drain the
queue some, but I'd like to turn them back on asap. For ideas
relating to organizing the bots to have a lower total cycle time
(not related to compiler perf), see
https://github.com/mozilla/rust/issues/8456
- The remaining time on other builders (non-all and non-vg) is closer
to 1h and breaks down about 50/50, 30 min compile, 30 min test.
One hour is still way too long.
- Of the testsuite time, there are two major foreground issues:
subprocess efficiency and metadata reading.
- On valgrind and non-valgrind alike, we're running subprocesses
inefficiently. This results in both too many threads (which
hurts valgrind especially badly) and too many blocking calls
(which hurts all platforms). I believe rewriting std::run to
use libuv will help significantly here. Alex is on this
but I imagine he can use help. See
https://github.com/mozilla/rust/pull/8645
- Metadata reading takes the majority of non-wait / non-system
CPU time of the testsuite. A 'perf top' of a run looks like this:
14.23% 0xffffffff810940d0
7.43% tinfl_decompress
3.32% ebml::reader::vuint_at
2.27% bfd_link_hash_traverse
2.22% io::u64_from_be_bytes
1.99% __memcpy_ssse3_back
1.94% metadata::decoder::lookup_hash
1.49% ebml::reader::tagged_docs
1.46% hash::__extensions__::meth_23277::write
1.26% __memmove_ssse3_back
1.26% ebml::reader::maybe_get_doc
1.26% malloc
1.17% 0x000000000006b4a0
1.15% bfd_hash_lookup
1.02% free
The amount of metadata itself is partly at issue, but more
seriously is just that we read (and parse) all the metadata
in any crate used, rather than just the part pertaining to
the use being made.
So, two subproblems: we parse all of it (due to algorithms
that read every item's metadata recursively, mostly in
resolve) and that because we parse all of it, we decompress
all of it. The parse-all-of-it subproblem, pcwalton is on
right now. I'll try to help out as I can in terms of filing
down any technical debt in metadata that I can.
The main bug for the "bad algorithm" issue is
https://github.com/mozilla/rust/issues/4572
but I've also opened a metabug on general metadata technical
debt so that we can possibly remove some of the horrors that
scare people away form anything metadata-related:
https://github.com/mozilla/rust/issues/8652
- Concerning metadata compression:
We could decompress using a different algorithm (or none at
all) and this might save us some time. But it's been tried
(see https://github.com/mozilla/rust/issues/6902) and wasn't
a huge win, and I think the larger win would be to change
algorithms to avoid reading the entire metadata segment.
Of course we can also try to compress each chunk
of metadata separately (say, each item or index worth) and
only decompress those of interest. We should not even be
_reading_ the entire metadata section from disk, period,
if we're only pulling in std::println or such.
- In terms of straight-line compile speed on libstd/extra/rustc,
there are a number of lingering codegen problems. No single
one accounts for all the overhead, I'm collecting together
those I'm aware of in https://github.com/mozilla/rust/issues/6819
but I'd like to draw attention to:
- By far the most obvious to me is "generating too many copies
of functions". Take a look at:
http://people.mozilla.org/~graydon/symbols-by-name.txt
This is from today. 79 copies of hashtable functions.
200 copies of iterator::next. 71 copies of Option::is_none.
70 copies of ptr::mut_offset. 123 copies of util::swap.
188 copies of vec::capacity.
This might relate to bugs like:
https://github.com/mozilla/rust/issues/2529
https://github.com/mozilla/rust/issues/2537
https://github.com/mozilla/rust/issues/7349
It's honestly not clear to me exactly what's causing it,
it might just be a question of factoring out code to
have fewer dependencies on type. Often C++ libraries
go out of their way to factor algorithms into type
dependent and type-independent parts. I've opened
https://github.com/mozilla/rust/issues/8651 to rewrite
type use altogether. Nobody's on this presently.
- There looks like there might be a particularly egregious
interaction with default methods? More than feels healthy.
For example of the 20,000 functions compiled for rustc,
1,900 of them are copies of functions from visit::, which
... seems like a lot, to me. I've opened
https://github.com/mozilla/rust/issues/8650
for investigating this. Nobody's on this presently.
- If you look in that set of symbols you'll also see far
too much glue being generated, especially visitor glue
but also a lot of seemingly redundant (same-size) drop
and take glue. Patrick has an old change that never
landed to help with visit glue
https://github.com/mozilla/rust/pull/6744
and I suspect there's a lot more analysis of type
dependency that we could do to cut down on the other
glues. Nobody's on this presently.
- Removing as much of the redundant drop machinery as we
can subject to tightened move-semantics. There's a bug
for this https://github.com/mozilla/rust/issues/5016 and
Niko is looking at it. Possibly he could use some help.
- Finally, for up-close testing of codegen, I strongly
suggest adding more cases to the codegen tests in
test/codegen. They've very easy to add and make for
helpful "are we generating anything remotely as good
as clang/C++ here?" checks. Some of those tests we're
doing as well as clang, some we're doing 2x worse.
If you have spare time or are otherwise intimidated
by optimization tasks, this is a great area to pitch
in. You just have to be able to express some idiom
in both rust and C++.
I've also been trying to improve overall visibility of performance
issues over recent months. To this end, we've been recording and
archiving all #[bench] benchmarks and similar metrics, and as I
mentioned in previous emails there's a simple mechanism for ratcheting
metrics now in the test runner. I've also asked Corey to take a look at
landing a consolidated framework for metrics-recording in
https://github.com/mozilla/rust/issues/6810 (he has an initial sketch in
https://github.com/mozilla/rust/pull/8646). I'm hoping this plays two roles:
- Making it sufficiently easy to record "new metrics" that we
discover previously-unknown dark matter performance problems.
- Finding metrics that we are comfortable ratcheting on, so that
we can continue to develop the compiler without regressing on
aspects relevant to performance.
If you have other ideas and areas of concern, or disagree or are unclear
on things I've mentioned in this email, please follow up.
Thanks,
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev