On Wed, Aug 21, 2013 at 8:53 PM, Bill Myers <[email protected]> wrote: > Have you considered the following "non-specific" quick fixes? > > 1. Build on a ramfs/ramdisk >
IO and especially disk IO are almost 0 compilation time. All files in a crate are read at once, then compilation happens. > 2. Distribute compilations and tests across a cluster of machines (like > distcc) > Compilation is 99% serial (the only things that happen in parallel are rustpkg and rustdoc etc at the end, and they are almost nothing), though tests could be distributed (and Graydon is working on doing that afaik). > 3. If non-parallelizable code is still the bottleneck, use the fastest CPU > possible (i.e. an overclocked Core i7 4770K, overclocked Core i7 > 4960X/3960X/4930K/3930K, dual Xeon E5 2687W or quad Xeon E5 4650 depending > on whether you need 4, 6, 16 or 32 cores) > > 4. Read metadata only once, in one of these ways: > 4a. Pass all files to a single compiler invocation (per machine or core) This already happens: crates are compiled all-at-once, unlike C/C++'s per-file-and-then-link compilation model. > 4b. Have a long-lived rustc "daemon" (per machine or core) that keeps crate > metadata in memory and gets passed files to compile by fd This wouldn't be that much of a quick fix, and the work to do it would need a betterly-structured metadata that wouldn't suffer from the same problems current metadata does (though this could still be an optimization later). > 4c. Use CRIU suspend/restore (or the unexec from Emacs or whatever) to > suspend a rustc process after metadata is read and restore that image for > each file instead of spawning a new one This is an interesting idea, pursuing it might be warranted. > 4d. Allocate metadata using a special allocator that allocates it from a > block at a fixed memory address, then just dump the block into a file, and > read metadata with a single mmap system call at that same fixed address > (this is a security hole in general, so it needs to be optional and off by > default) Also an interesting idea, though a bit of work. Brian is working on not compressing metadata, which would be a win. Here is my current understanding of the problems with metadata (from scouring some profiles and the code ~3 months ago): - Metadata is large. It is multiple megabytes of data (uncompressed. compressed as of now it is 749K) for libstd. I'm not sure whether we are encoding too much data or if it's exactly what we need, but this is a very large constant that every inefficiency gets scaled by. - Metadata is stored as EBML, always. I'm sure EBML is fine most of the time, but it really hurts us here. Part of the problem is that it is a big endian format. The byte swapping shows up very high (top 5) in a profile. Additionally, *every time we query the stored metadata for something, we decode the EBML*. This turns EBML into a multiplicative slowdown, rather than just an additive if we decoded the entire structure up front and used the native-rust representation. - Metadata is stored in a single blob. To access any part of metadata, we need to load it all, and there is no index. If we could only load the metadata we need, we'd reduce memory usage and do less wasted work. - Metadata is scary. It's a hairy part of the codebase that I sure don't understand. I know its purpose, more or less, but not the specifics of how or why things are encoded. Michael Sullivan could speak more to this, he is the last one to have touched it. The compiler can't help you when you make a mistake, either. I think the solution requires a much more systemic change than your proposes quick fixes. _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
