On Wed, Aug 21, 2013 at 11:03 PM, Patrick Walton <[email protected]> wrote:
> On 8/21/13 7:47 PM, Corey Richardson wrote:
>>
>> IO and especially disk IO are almost 0 compilation time. All files in
>> a crate are read at once, then compilation happens.
>
>
> I don't believe this is true, as disk IO from metadata reading hurts.
>

I'm not convinced it's false. Our IO patterns are pretty predictable
and diskcache friendly. Assuming we're not blowing the cache out of
the water with our memory usage, but compiling the tests shouldn't be
too much memory (comparatively), ~128MB max I'd expect. But I don't
have numbers, this is speculation.

>
>> - Metadata is large. It is multiple megabytes of data (uncompressed.
>> compressed as of now it is 749K) for libstd. I'm not sure whether we
>> are encoding too much data or if it's exactly what we need, but this
>> is a very large constant that every inefficiency gets scaled by.
>
>
> We could probably do a bit better here, but we do have to serialize ASTs for
> generics. libstd is already 2.3MB of Rust code, so I would expect the
> serialized ASTs from the generics to be on that order.
>

Yeah, I don't expect any major wins in this area. One thing that has
been on the back of my mind (and I think I've seen Graydon mention it)
is splitting metadata out of the object file, similar to how you can
split out debuginfo. I'm hesitant about this feature though. I'd hate
to require -dev or -devel packages, that's part of the pain of
headers.

>
>> - Metadata is stored as EBML, always. I'm sure EBML is fine most of
>> the time, but it really hurts us here. Part of the problem is that it
>> is a big endian format. The byte swapping shows up very high (top 5)
>> in a profile.
>
>
> If I had to do it all over again I'd use atom trees instead of EBML, but I
> doubt that getting rid of vuints will help that much. I rewrote that routine
> in optimized asm once and it didn't help. The reason vuint_at shows up so
> high in the profile is mostly because, algorithmically, we read metadata too
> much, and reading integers is most of what reading metadata is.
>

I would think it's the fact that the byte swapping is happening at
all, though I don't have numbers here either. Experimenting with
encoding it as little endian and not doing a swap should be easy to
experiment with.

>
>> Additionally, *every time we query the stored metadata
>> for something, we decode the EBML*. This turns EBML into a
>> multiplicative slowdown, rather than just an additive if we decoded
>> the entire structure up front and used the native-rust representation.
>
>
> If we decoded the entire structure up front and used native Rust we would
> lose the index and would suffer a slowdown. I believe Niko tried this and
> saw massive performance losses.
>

That's good to know.

>
>> - Metadata is stored in a single blob. To access any part of metadata,
>> we need to load it all, and there is no index. If we could only load
>> the metadata we need, we'd reduce memory usage and do less wasted
>> work.
>
>
> This is untrue; there is an index. It's just that not every part of the
> compiler uses it yet. I have a patch that I will try to land tomorrow that
> converts resolve to be lazy and consult the index and reduces its time on
> hello world by 10x. Method tables in coherence will require a bit more work
> to be lazy, but could also reduce its time.
>

Ah, I must have missed it and any code that uses it in my (admittedly
brief) look at the code.

(I think this also addresses the points Graydon brought up: thanks for
correcting my understanding!)
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to