Re: [rust-dev] Plans for improving compiler performance
On 05/01/2013 1:37 PM, Patrick Walton wrote: First off, I do not see any way we can improve *general* compiler performance by an order of magnitude. I don't feel this sort of quantitative statement can be made, from where we are now. When I say I expect we should be able to get it an order of magnitude faster I'm just talking comparatively, against codebases of similar size/complexity in similar-abstraction-level languages, it builds too slowly and produces artifacts that are too big. The landscape of possible changes is IMO too complex to judge the particulars in quantitative terms yet. And I'm not going to address the abstract arguments about upper bounds of speed; in particular, thingsl ike we can't typecheck the way $LANGUAGE does in a big-O sense is just not the level I want to be looking at the problem. I'm mostly concerned about residual systemic taxes / technical debt: - Non-use of arenas and , pervasively - Allocating and freeing effectively constant data we don't properly constify. Note: _no_ expression-level constant folding happens presently outside of const item initializers. - Tag discriminant values loaded from memory (!) - Implicit copying - Inefficient representations: as you point out, our memory manager probably uses 2x-3x more memory than it needs to, especially for small allocations - Allocator locking, fragmentation, general non-optimality - Refcounting traffic - Stack growth and task lifecycle stuff that may be optional or obsolete - Landing pads - Cleanup duplication in _any_ scope-exit scenario - Redundant basic blocks - The match code translation, which (in our own admission) nobody is comfortable enough with to hack on presently - Glue code (not just visitor; we still generate drop and free) - Falling off fastisel - Not annotating functions correctly (noalias/nocapture/nounwind/readonly/readnone/optsize) - Metadata encoding, organization, I/O paths - Use of dynamic closures rather than traits and/or vtables, Objects - Wrong inline settings in libs (we don't capture or measure this) - Wrong monomorphization settings (likewise; count copies of vec::foo in any resulting binary) - Non-optimality / taxes of: - fmt! - assert and fail - bounds checks - / and % checks, idiv on [] That's just off the top of my head. My general rule with perf work is that you don't even discover the worst of it until you're knee deep in solving curious unexplained parts of such problems. Given how many of the items on that list have direct consequences in terms of how much code we throw at LLVM, I'm not confident saying anything is optimal or not able to be improved yet. All I see are lots of areas of performance-related technical debt juxtaposed with too-big binaries and too-slow compiles. I'm not doing any arithmetic beyond that. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Plans for improving compiler performance
On 06/01/2013 2:57 PM, Tim Chevalier wrote: Personally, I think our time would be better spent making it easy to break large projects up into multiple finer-grained crates. We should be able to tell cargo and/or the work-in-progress `fbuild` workalike to compile a crate and rebuild all of its dependent crates (if they were modified) in one command. This strikes me as more reliable than incremental compilation, because crate structure enforces a DAG. What worries me with incremental compilation is that we'll do a lot of work to make it work, then we'll discover that in practice intra-crate dependencies are so intertwined that most changes result in a full rebuild anyway. I think that it would be good to do an experiment to see whether or not that worry is justified, which is to say, printing out what the dependency graph is and seeing how modular is for rustc and perhaps other Rust crates. In terms of immediate project scheduling, getting the build / package system under control is higher priority as it's the main gate on community involvement / growing the library ecosystem / scaling up servo work. It'll also make it easier to develop patches to (say) libstd without always cycling through a triple-rustc-bootstrap. And easier to get started by just downloading a prebuilt librustllvm. c c. In the longer term of parallelizing rustc as much as possible, you're right that formalizing intra-crate item dependency is an essential first step. It's just a matter of priority-making. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Plans for improving compiler performance
Could you use multiple threads to type check and code gen in parallel? Could you retain information from a previous run of the compiler and reuse it (especially for code generation)? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Plans for improving compiler performance
What stands in the way of doing incremental compilation? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Plans for improving compiler performance
On 1/6/13 9:50 AM, Lindsey Kuper wrote: What stands in the way of doing incremental compilation? Personally, I think our time would be better spent making it easy to break large projects up into multiple finer-grained crates. We should be able to tell cargo and/or the work-in-progress `fbuild` workalike to compile a crate and rebuild all of its dependent crates (if they were modified) in one command. This strikes me as more reliable than incremental compilation, because crate structure enforces a DAG. What worries me with incremental compilation is that we'll do a lot of work to make it work, then we'll discover that in practice intra-crate dependencies are so intertwined that most changes result in a full rebuild anyway. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Plans for improving compiler performance
Niko Matsakis wrote: Here are some relatively simple things we could do to improve performance in type-checking, off the top of my head: I should add that the overall impact of these (and other) changes might easily be small. I know you have profiles showing malloc to be a relatively minor contribution to overall performance, for example. My feeling is that it's hard to estimate the impact in advance---past experience suggests that sometimes these sorts of changes have an oversized impact relative to the profile and sometimes none at all. Long term I think we should try to tighten up performance and, if we do enough of that, things *will* get faster. Have we tried to profile memory consumption at all? I'd be curious to know e.g. what portion of our memory is used in the AST representation vs other things. It should be easy enough to use dtrace and get an idea how much is allocated in each pass. Niko ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Plans for improving compiler performance
On Sun, Jan 6, 2013 at 10:29 AM, Patrick Walton pwal...@mozilla.com wrote: On 1/6/13 9:50 AM, Lindsey Kuper wrote: What stands in the way of doing incremental compilation? Personally, I think our time would be better spent making it easy to break large projects up into multiple finer-grained crates. We should be able to tell cargo and/or the work-in-progress `fbuild` workalike to compile a crate and rebuild all of its dependent crates (if they were modified) in one command. This strikes me as more reliable than incremental compilation, because crate structure enforces a DAG. What worries me with incremental compilation is that we'll do a lot of work to make it work, then we'll discover that in practice intra-crate dependencies are so intertwined that most changes result in a full rebuild anyway. I think that it would be good to do an experiment to see whether or not that worry is justified, which is to say, printing out what the dependency graph is and seeing how modular is for rustc and perhaps other Rust crates. I put in some work towards doing this, but got derailed at some point. It's not particularly difficult, though, and then we could make an informed decision about whether or not to go down this path. A lot of the work for incremental compilation will likely also be useful for parallelizing the compiler. So I don't see it as a waste of time. Cheers, Tim -- Tim Chevalier * http://catamorphism.org/ * Often in error, never in doubt Too much to carry, too much to let go Time goes fast, learning goes slow. -- Bruce Cockburn ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] Plans for improving compiler performance
I thought I'd begin a discussion as to how to improve performance of the rustc compiler. First off, I do not see any way we can improve *general* compiler performance by an order of magnitude. The language is simply designed to favor safety and expressivity over compile-time performance. Other than code generation, typechecking is by far the longest pass for large Rust. But there is an upper bound on how fast we can make typechecking, because the language requires subtyping, generics, and a variant of Hindley-Milner type inference. This means that these common tricks cannot be used: 1. Fast C-like typechecking won't work because we need to solve for type variables. For instance, the type of `let x = [];` or `let y = None;` is determined from use, unlike for example C++, Java, C#, or Go. 2. Fast ML-like type equality can be determined with a pointer comparison tricks will not work, because we have subtyping and must recur on type structure to unify. 3. Nominal types in general cannot be represented as a simple integer class ID, as in early Java. They require a heap-allocated vector to represent the type parameter substitutions. In general, the low-hanging fruit for general compiler performance is mostly picked at this point. I would put an upper bound of compiler performance improvements for all stages of a self-hosted build of the Rust compiler at 20% or so. The reasons for this are: 1. Typechecking and LLVM code generation are mostly optimal. When compiling `rustc`, the time spent in these two passes dwarfs all the others. Typechecking cannot be algorithmically improved, and LLVM code generation is about as straightforward as it can possibly be. The remaining performance issues in these two passes are generally due to allocating too much, but allocation and freeing in Rust is no more than 15% of the compile time. Thus even if we spent all our time on the allocator and got its cost down to a theoretical zero, we would only improve performance by 15% or so. 2. LLVM optimizations end up dominating compile time when they're turned on (75% of compile time). However, the Rust compiler, like most Rust (or C++) code, is dependent on LLVM optimizations for good performance. So if you turn off optimization, you have a slow compiler. But if you turn on optimization, the vast majority of your self-hosting time is spent in LLVM optimizations. The obvious way around this catch-22 is to spend a lot of time manually writing the optimizations that LLVM would have performed into our compiler in order to improve performance at -O0, but I don't think that's a particularly good use of our time, and it would hurt the compiler's maintainability. There are, however, some more situational things we can do. # General code generation performance * We can make `~string` allocations (and some others, like ~[ 1, 2, 3, 4, 5 ]) turn into calls to the moral equivalent of `strdup`. This improves some workloads, such as the PGP key in cargo (which should just be a constant in the first place). `rustc` still allocates a lot of strings like this, so this might improve the LLVM portion of `rustc`'s compilation speed. * Visitor glue should be optional; you should have to opt into its generation, like Haskell's `Data.Typeable`. This would potentially remove 15% of our code size and improve our code generation performance by a similar amount, but, as Graydon points out, it is needed for precise-on-the-heap GC. Perhaps we could use conservative GC at -O0, and thus reduce the amount of visitor glue we need to generate for unoptimized builds. # -O0 performance For -O0 (which is the default), we get kicked off LLVM's fast instruction selector far too often. We need to stop generating the instructions that cause LLVM to bail out to the slow SelectionDAG path at -O0. This only affects -O0, but since that's the most common case that matters for compilation speed, that's fine. Note that these optimizations are severely limited in what they can do for self-hosting performance, for the reasons stated above. * Invoke instructions cause FastISel bailouts. This means that we can't use the C++ exception infrastructure for task failure if we want fast builds. Graydon has work on an optional return-value-based unwinding mode which is nearing completion. I have a patch in review for a disable_unwinding flag, which disables unwinding for failure; this should be safe to turn on for libsyntax and librustc, since they have no need to handle failure gracefully, and doing so improves compile-time -O0 LLVM performance by 1.9x. * Visitor glue used to cause FastISel bailouts, but this is fixed in incoming. * Switch instructions cause FastISel bailouts. Pattern matching on enums (and sometimes on integers too) generates these. Drop and take glue on enums generates these too. This shouldn't be too hard to fix. * Integer division instructions result in FastISel bailouts on x86.
Re: [rust-dev] Plans for improving compiler performance
I realized I forgot some things. # Garbage collection * Not generating reference count manipulations (which would be possible if we switched to tracing GC) improves code generation speed. I created a test that resulted in a 22.5% improvement in LLVM pass speed. Not an order of magnitude difference, but it's nice. # FastISel * There is another issue: `i1` parameters cause FastISel bailouts. These would be our `bool` type. Probably the solution here is to translate our `bool` as `i8` instead of `i1`. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev