[rust-dev] cycle time, compile/test performance

Graydon Hoare Tue, 20 Aug 2013 14:28:20 -0700

Hi,

We had a meeting today that was mostly about cycle time. Integrating a
change on the rust repo currently takes about 2 hours, which is still
far too much. We've tried approaching this from a variety of
perspectives in the past and I feel like possibly I've not conveyed the
cost and work breakdown, or discussed systematic approaches to the
problem yet. I'd like to do so now, as I believe the issues are all
quite solvable and I'd very much like people to shift their energy to
focus more on these matters:


  - Top level measurements of cycle time are here:

      http://huonw.github.io/isrustfastyet/buildbot/all

  - Our worst offenders are the -all and -vg builders. The former does
    a full cross-compile bootstrap, thus winds up building 9 copies of
    the compiler and libraries. The latter runs the testsuite under
    valgrind. I've (temporarily) turned these off, in order to drain the
    queue some, but I'd like to turn them back on asap. For ideas
    relating to organizing the bots to have a lower total cycle time
    (not related to compiler perf), see
    https://github.com/mozilla/rust/issues/8456

  - The remaining time on other builders (non-all and non-vg) is closer
    to 1h and breaks down about 50/50, 30 min compile, 30 min test.
    One hour is still way too long.

  - Of the testsuite time, there are two major foreground issues:
    subprocess efficiency and metadata reading.

  - On valgrind and non-valgrind alike, we're running subprocesses
    inefficiently. This results in both too many threads (which
    hurts valgrind especially badly) and too many blocking calls
    (which hurts all platforms). I believe rewriting std::run to
    use libuv will help significantly here. Alex is on this
    but I imagine he can use help. See
    https://github.com/mozilla/rust/pull/8645

  - Metadata reading takes the majority of non-wait / non-system
    CPU time of the testsuite. A 'perf top' of a run looks like this:

                 14.23% 0xffffffff810940d0
                  7.43% tinfl_decompress
                  3.32% ebml::reader::vuint_at
                  2.27% bfd_link_hash_traverse
                  2.22% io::u64_from_be_bytes
                  1.99% __memcpy_ssse3_back
                  1.94% metadata::decoder::lookup_hash
                  1.49% ebml::reader::tagged_docs
                  1.46% hash::__extensions__::meth_23277::write
                  1.26% __memmove_ssse3_back
                  1.26% ebml::reader::maybe_get_doc
                  1.26% malloc
                  1.17% 0x000000000006b4a0
                  1.15% bfd_hash_lookup
                  1.02% free

    The amount of metadata itself is partly at issue, but more
    seriously is just that we read (and parse) all the metadata
    in any crate used, rather than just the part pertaining to
    the use being made.

    So, two subproblems: we parse all of it (due to algorithms
    that read every item's metadata recursively, mostly in
    resolve) and that because we parse all of it, we decompress
    all of it. The parse-all-of-it subproblem, pcwalton is on
    right now. I'll try to help out as I can in terms of filing
    down any technical debt in metadata that I can.

    The main bug for the "bad algorithm" issue is
    https://github.com/mozilla/rust/issues/4572

    but I've also opened a metabug on general metadata technical
    debt so that we can possibly remove some of the horrors that
    scare people away form anything metadata-related:
    https://github.com/mozilla/rust/issues/8652

  - Concerning metadata compression:

    We could decompress using a different algorithm (or none at
    all) and this might save us some time. But it's been tried
    (see https://github.com/mozilla/rust/issues/6902) and wasn't
    a huge win, and I think the larger win would be to change
    algorithms to avoid reading the entire metadata segment.

    Of course we can also try to compress each chunk
    of metadata separately (say, each item or index worth) and
    only decompress those of interest. We should not even be
    _reading_ the entire metadata section from disk, period,
    if we're only pulling in std::println or such.

  - In terms of straight-line compile speed on libstd/extra/rustc,
    there are a number of lingering codegen problems. No single
    one accounts for all the overhead, I'm collecting together
    those I'm aware of in https://github.com/mozilla/rust/issues/6819
    but I'd like to draw attention to:

    - By far the most obvious to me is "generating too many copies
      of functions". Take a look at:

      http://people.mozilla.org/~graydon/symbols-by-name.txt

      This is from today. 79 copies of hashtable functions.
      200 copies of iterator::next. 71 copies of Option::is_none.
      70 copies of ptr::mut_offset. 123 copies of util::swap.
      188 copies of vec::capacity.

      This might relate to bugs like:
        https://github.com/mozilla/rust/issues/2529
        https://github.com/mozilla/rust/issues/2537
        https://github.com/mozilla/rust/issues/7349

      It's honestly not clear to me exactly what's causing it,
      it might just be a question of factoring out code to
      have fewer dependencies on type. Often C++ libraries
      go out of their way to factor algorithms into type
      dependent and type-independent parts. I've opened
      https://github.com/mozilla/rust/issues/8651 to rewrite
      type use altogether. Nobody's on this presently.

    - There looks like there might be a particularly egregious
      interaction with default methods?  More than feels healthy.
      For example of the 20,000 functions compiled for rustc,
      1,900 of them are copies of functions from visit::, which
      ... seems like a lot, to me. I've opened
      https://github.com/mozilla/rust/issues/8650
      for investigating this. Nobody's on this presently.

    - If you look in that set of symbols you'll also see far
      too much glue being generated, especially visitor glue
      but also a lot of seemingly redundant (same-size) drop
      and take glue. Patrick has an old change that never
      landed to help with visit glue
      https://github.com/mozilla/rust/pull/6744
      and I suspect there's a lot more analysis of type
      dependency that we could do to cut down on the other
      glues. Nobody's on this presently.

    - Removing as much of the redundant drop machinery as we
      can subject to tightened move-semantics. There's a bug
      for this https://github.com/mozilla/rust/issues/5016 and
      Niko is looking at it. Possibly he could use some help.

    - Finally, for up-close testing of codegen, I strongly
      suggest adding more cases to the codegen tests in
      test/codegen. They've very easy to add and make for
      helpful "are we generating anything remotely as good
      as clang/C++ here?" checks. Some of those tests we're
      doing as well as clang, some we're doing 2x worse.
      If you have spare time or are otherwise intimidated
      by optimization tasks, this is a great area to pitch
      in. You just have to be able to express some idiom
      in both rust and C++.

I've also been trying to improve overall visibility of performance
issues over recent months. To this end, we've been recording and
archiving all #[bench] benchmarks and similar metrics, and as I
mentioned in previous emails there's a simple mechanism for ratcheting
metrics now in the test runner. I've also asked Corey to take a look at
landing a consolidated framework for metrics-recording in
https://github.com/mozilla/rust/issues/6810 (he has an initial sketch in
https://github.com/mozilla/rust/pull/8646). I'm hoping this plays two roles:

  - Making it sufficiently easy to record "new metrics" that we
    discover previously-unknown dark matter performance problems.

  - Finding metrics that we are comfortable ratcheting on, so that
    we can continue to develop the compiler without regressing on
    aspects relevant to performance.

If you have other ideas and areas of concern, or disagree or are unclear
on things I've mentioned in this email, please follow up.

Thanks,

-Graydon


_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

[rust-dev] cycle time, compile/test performance

Reply via email to