Re: [rust-dev] Plans for improving compiler performance

2013-01-07 Thread Graydon Hoare
On 05/01/2013 1:37 PM, Patrick Walton wrote:

 First off, I do not see any way we can improve *general* compiler
 performance by an order of magnitude.

I don't feel this sort of quantitative statement can be made, from where
we are now. When I say I expect we should be able to get it an order of
magnitude faster I'm just talking comparatively, against codebases of
similar size/complexity in similar-abstraction-level languages, it
builds too slowly and produces artifacts that are too big.

The landscape of possible changes is IMO too complex to judge the
particulars in quantitative terms yet. And I'm not going to address the
abstract arguments about upper bounds of speed; in particular, thingsl
ike we can't typecheck the way $LANGUAGE does in a big-O sense is just
not the level I want to be looking at the problem. I'm mostly concerned
about residual systemic taxes / technical debt:

  - Non-use of arenas and , pervasively
  - Allocating and freeing effectively constant data we don't properly
constify. Note: _no_ expression-level constant folding happens
presently outside of const item initializers.
  - Tag discriminant values loaded from memory (!)
  - Implicit copying
  - Inefficient representations: as you point out, our memory manager
probably uses 2x-3x more memory than it needs to, especially for
small allocations
  - Allocator locking, fragmentation, general non-optimality
  - Refcounting traffic
  - Stack growth and task lifecycle stuff that may be optional or
obsolete
  - Landing pads
  - Cleanup duplication in _any_ scope-exit scenario
  - Redundant basic blocks
  - The match code translation, which (in our own admission) nobody is
comfortable enough with to hack on presently
  - Glue code (not just visitor; we still generate drop and free)
  - Falling off fastisel
  - Not annotating functions correctly
(noalias/nocapture/nounwind/readonly/readnone/optsize)
  - Metadata encoding, organization, I/O paths
  - Use of dynamic closures rather than traits and/or vtables, Objects
  - Wrong inline settings in libs (we don't capture or measure this)
  - Wrong monomorphization settings (likewise; count copies of vec::foo
in any resulting binary)
  - Non-optimality / taxes of:
 - fmt!
 - assert and fail
 - bounds checks
 - / and % checks, idiv on []

That's just off the top of my head. My general rule with perf work is
that you don't even discover the worst of it until you're knee deep in
solving curious unexplained parts of such problems. Given how many of
the items on that list have direct consequences in terms of how much
code we throw at LLVM, I'm not confident saying anything is optimal or
not able to be improved yet.

All I see are lots of areas of performance-related technical debt
juxtaposed with too-big binaries and too-slow compiles. I'm not doing
any arithmetic beyond that.

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Plans for improving compiler performance

2013-01-07 Thread Graydon Hoare
On 06/01/2013 2:57 PM, Tim Chevalier wrote:

 Personally, I think our time would be better spent making it easy to break
 large projects up into multiple finer-grained crates. We should be able to
 tell cargo and/or the work-in-progress `fbuild` workalike to compile a crate
 and rebuild all of its dependent crates (if they were modified) in one
 command.

 This strikes me as more reliable than incremental compilation, because crate
 structure enforces a DAG. What worries me with incremental compilation is
 that we'll do a lot of work to make it work, then we'll discover that in
 practice intra-crate dependencies are so intertwined that most changes
 result in a full rebuild anyway.
 
 I think that it would be good to do an experiment to see whether or
 not that worry is justified, which is to say, printing out what the
 dependency graph is and seeing how modular is for rustc and perhaps
 other Rust crates.

In terms of immediate project scheduling, getting the build / package
system under control is higher priority as it's the main gate on
community involvement / growing the library ecosystem / scaling up servo
work. It'll also make it easier to develop patches to (say) libstd
without always cycling through a triple-rustc-bootstrap. And easier to
get started by just downloading a prebuilt librustllvm. c c.

In the longer term of parallelizing rustc as much as possible, you're
right that formalizing intra-crate item dependency is an essential first
step. It's just a matter of priority-making.

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Plans for improving compiler performance

2013-01-06 Thread james

Could you use multiple threads to type check and code gen in parallel?

Could you retain information from a previous run of the compiler and 
reuse it (especially for code generation)?

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Plans for improving compiler performance

2013-01-06 Thread Lindsey Kuper
What stands in the way of doing incremental compilation?
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Plans for improving compiler performance

2013-01-06 Thread Patrick Walton

On 1/6/13 9:50 AM, Lindsey Kuper wrote:

What stands in the way of doing incremental compilation?


Personally, I think our time would be better spent making it easy to 
break large projects up into multiple finer-grained crates. We should be 
able to tell cargo and/or the work-in-progress `fbuild` workalike to 
compile a crate and rebuild all of its dependent crates (if they were 
modified) in one command.


This strikes me as more reliable than incremental compilation, because 
crate structure enforces a DAG. What worries me with incremental 
compilation is that we'll do a lot of work to make it work, then we'll 
discover that in practice intra-crate dependencies are so intertwined 
that most changes result in a full rebuild anyway.


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Plans for improving compiler performance

2013-01-06 Thread Niko Matsakis



Niko Matsakis wrote:
Here are some relatively simple things we could do to improve 
performance in type-checking, off the top of my head: 


I should add that the overall impact of these (and other) changes might 
easily be small.  I know you have profiles showing malloc to be a 
relatively minor contribution to overall performance, for example.  My 
feeling is that it's hard to estimate the impact in advance---past 
experience suggests that sometimes these sorts of changes have an 
oversized impact relative to the profile and sometimes none at all.  
Long term I think we should try to tighten up performance and, if we do 
enough of that, things *will* get faster.


Have we tried to profile memory consumption at all?  I'd be curious to 
know e.g. what portion of our memory is used in the AST representation 
vs other things.  It should be easy enough to use dtrace and get an idea 
how much is allocated in each pass.



Niko
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Plans for improving compiler performance

2013-01-06 Thread Tim Chevalier
On Sun, Jan 6, 2013 at 10:29 AM, Patrick Walton pwal...@mozilla.com wrote:
 On 1/6/13 9:50 AM, Lindsey Kuper wrote:

 What stands in the way of doing incremental compilation?


 Personally, I think our time would be better spent making it easy to break
 large projects up into multiple finer-grained crates. We should be able to
 tell cargo and/or the work-in-progress `fbuild` workalike to compile a crate
 and rebuild all of its dependent crates (if they were modified) in one
 command.

 This strikes me as more reliable than incremental compilation, because crate
 structure enforces a DAG. What worries me with incremental compilation is
 that we'll do a lot of work to make it work, then we'll discover that in
 practice intra-crate dependencies are so intertwined that most changes
 result in a full rebuild anyway.

I think that it would be good to do an experiment to see whether or
not that worry is justified, which is to say, printing out what the
dependency graph is and seeing how modular is for rustc and perhaps
other Rust crates.

I put in some work towards doing this, but got derailed at some point.
It's not particularly difficult, though, and then we could make an
informed decision about whether or not to go down this path. A lot of
the work for incremental compilation will likely also be useful for
parallelizing the compiler. So I don't see it as a waste of time.

Cheers,
Tim

-- 
Tim Chevalier * http://catamorphism.org/ * Often in error, never in doubt
Too much to carry, too much to let go
Time goes fast, learning goes slow. -- Bruce Cockburn
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Plans for improving compiler performance

2013-01-05 Thread Patrick Walton
I thought I'd begin a discussion as to how to improve performance of the 
rustc compiler.


First off, I do not see any way we can improve *general* compiler 
performance by an order of magnitude. The language is simply designed to 
favor safety and expressivity over compile-time performance. Other than 
code generation, typechecking is by far the longest pass for large Rust. 
But there is an upper bound on how fast we can make typechecking, 
because the language requires subtyping, generics, and a variant of 
Hindley-Milner type inference. This means that these common tricks 
cannot be used:


1. Fast C-like typechecking won't work because we need to solve for type 
variables. For instance, the type of `let x = [];` or `let y = None;` is 
determined from use, unlike for example C++, Java, C#, or Go.


2. Fast ML-like type equality can be determined with a pointer 
comparison tricks will not work, because we have subtyping and must 
recur on type structure to unify.


3. Nominal types in general cannot be represented as a simple integer 
class ID, as in early Java. They require a heap-allocated vector to 
represent the type parameter substitutions.


In general, the low-hanging fruit for general compiler performance is 
mostly picked at this point. I would put an upper bound of compiler 
performance improvements for all stages of a self-hosted build of the 
Rust compiler at 20% or so. The reasons for this are:


1. Typechecking and LLVM code generation are mostly optimal. When 
compiling `rustc`, the time spent in these two passes dwarfs all the 
others. Typechecking cannot be algorithmically improved, and LLVM code 
generation is about as straightforward as it can possibly be. The 
remaining performance issues in these two passes are generally due to 
allocating too much, but allocation and freeing in Rust is no more than 
15% of the compile time. Thus even if we spent all our time on the 
allocator and got its cost down to a theoretical zero, we would only 
improve performance by 15% or so.


2. LLVM optimizations end up dominating compile time when they're turned 
on (75% of compile time). However, the Rust compiler, like most Rust (or 
C++) code, is dependent on LLVM optimizations for good performance. So 
if you turn off optimization, you have a slow compiler. But if you turn 
on optimization, the vast majority of your self-hosting time is spent in 
LLVM optimizations. The obvious way around this catch-22 is to spend a 
lot of time manually writing the optimizations that LLVM would have 
performed into our compiler in order to improve performance at -O0, but 
I don't think that's a particularly good use of our time, and it would 
hurt the compiler's maintainability.


There are, however, some more situational things we can do.

# General code generation performance

* We can make `~string` allocations (and some others, like ~[ 1, 2, 3, 
4, 5 ]) turn into calls to the moral equivalent of `strdup`. This 
improves some workloads, such as the PGP key in cargo (which should just 
be a constant in the first place). `rustc` still allocates a lot of 
strings like this, so this might improve the LLVM portion of `rustc`'s 
compilation speed.


* Visitor glue should be optional; you should have to opt into its 
generation, like Haskell's `Data.Typeable`. This would potentially 
remove 15% of our code size and improve our code generation performance 
by a similar amount, but, as Graydon points out, it is needed for 
precise-on-the-heap GC. Perhaps we could use conservative GC at -O0, and 
thus reduce the amount of visitor glue we need to generate for 
unoptimized builds.


# -O0 performance

For -O0 (which is the default), we get kicked off LLVM's fast 
instruction selector far too often. We need to stop generating the 
instructions that cause LLVM to bail out to the slow SelectionDAG path 
at -O0.


This only affects -O0, but since that's the most common case that 
matters for compilation speed, that's fine. Note that these 
optimizations are severely limited in what they can do for self-hosting 
performance, for the reasons stated above.


* Invoke instructions cause FastISel bailouts. This means that we can't 
use the C++ exception infrastructure for task failure if we want fast 
builds. Graydon has work on an optional return-value-based unwinding 
mode which is nearing completion. I have a patch in review for a 
disable_unwinding flag, which disables unwinding for failure; this 
should be safe to turn on for libsyntax and librustc, since they have no 
need to handle failure gracefully, and doing so improves compile-time 
-O0 LLVM performance by 1.9x.


* Visitor glue used to cause FastISel bailouts, but this is fixed in 
incoming.


* Switch instructions cause FastISel bailouts. Pattern matching on enums 
(and sometimes on integers too) generates these. Drop and take glue on 
enums generates these too. This shouldn't be too hard to fix.


* Integer division instructions result in FastISel bailouts on x86. 

Re: [rust-dev] Plans for improving compiler performance

2013-01-05 Thread Patrick Walton

I realized I forgot some things.

# Garbage collection

* Not generating reference count manipulations (which would be possible 
if we switched to tracing GC) improves code generation speed. I created 
a test that resulted in a 22.5% improvement in LLVM pass speed. Not an 
order of magnitude difference, but it's nice.


# FastISel

* There is another issue: `i1` parameters cause FastISel bailouts. These 
would be our `bool` type. Probably the solution here is to translate our 
`bool` as `i8` instead of `i1`.


Patrick
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev