On 15/05/2011 2:37 PM, Patrick Walton wrote:
Hi everyone,
It occurred to me that our insistence on dynamic linking for the
standard library is likely to cause performance problems.
It might. Measure twice, cut once.
Consider uint::range(). This is the preferred way to do a C-style for
loop in Rust. Up to this point we've been preferring to write C-style
for loops as while loops, as near as I can tell, due to some early
flakiness in rustboot, and for performance reasons.
I'm not sure there's a single preferred way. We have some special-casing
in the language for built-in loop types (elements of vectors for
example, avoiding bounds-checking on each iteration) and we could do
more; for example we could add a 3-clause C style for loop, or a "from
.. to .. by" range loop over scalar types. Or other options, see below.
At the moment, *each iteration* of uint::range() requires not one, but
two indirect function calls
Two? It should only be one. Range calls block, block runs, range does +=
1, range calls block again. It's also not exactly a surprising indirect
function; it keeps calling the same one, from the same place in the code
+ stack. IOW it'll *probably* be well predicted.
LLVM (indeed, each boundary is a DLL boundary). This is likely not going
to work; we're inhibiting all loop optimizations and forcing new stack
frames to be created for every trip around the loop.
It is definitely going to inhibit some optimizations, it's a question of
asking which ones and how high the costs are. Measure?
If you find it is really bad, which is surely possible, then I'd note
that for scalar-range loops I think it's legit to ask for some language
help (possibly including pointer-range? maybe we can unify the fast
vec-element for loop, since it's just a pointer scalar-range).
For iters with substantially more structure, I don't think "having a
funciton call in the middle" is a guaranteed major performance issue.
It's worth measuring a few different "kinds" of iter before jumping to
the conclusion that it can't involve any calling.
So range() can't be dynamically linked anymore. The simplest
solution I can think of is to allow both static and dynamic library crates.
Certainly possible. You're jumping through a few inferences to a limited
set of conclusions but it's a possible approach.
There's been a movement lately in some circles to forbid dynamic linking
entirely; Go for example explicitly does not support it (in keeping with
the Plan 9 tradition). I tend to disagree; I think that supporting
dynamic linking is useful to enable memory savings and to make security
problems in the wild easy to patch.
Yeah. I'm not really a "fan" of dynamic linking. There are also studies
showing the putative memory savings of dynamic linking are .. pretty
hypothetical. But then it really depends on the system load, system
design, what you're measuring, and and who you ask.
I mostly wanted dynamic linking so that people who *are* convinced they
want it can have it, for whatever reason. It's a common sentiment and as
I said before, I'm a pluralist. I'm usually fine statically linking the
programs I write. But I want dynamic to work.
But exclusively relying on it is not going to be tenable;
some routines are so simple, yet so critical to performance, that static
linking is probably going to be the only viable option.
I think you mean more than "static linking", since LLVM is not a linker;
you mean static linking where the .a file actually contains LLVM
bitcode, a la the LLVM-LTO approach. That's plausible. It's *different
again* from actual static linking. And it's different from pickled ASTs,
and different from macros. So there are like ... 4 points on the
spectrum here in addition to our existing compilation strategy:
- Pickling ASTs so that we can load in STL-like "declared externally
but mixed in to your compilation unit" functions. So that the
inlining or intertwingling can happen at the front-and-middle ends,
not just the LLVM layer.
- Powerful macros that cover the "needs to be efficient" cases as well
as the various other macro scenarios. Definitely front-end!
- Static linking, just to avoid having runtime dependencies and PIC
code and such. But still "calls a function on each iter".
- Static linking LLVM bitcode, to get LTO across crates and cross-iter
inlining.
On this list, I'd point out a couple things:
- We want a macro system anyways. So let's build that soon.
- You only have an inlining problem *when* you're dealing with
separate crates. Granted, uint::range is "standard library fodder",
but if you have some custom data structure X with iters on X
implemented in the same crate as the iter-callers, LLVM may well
inline the iter already. And uint::range is one of those cases
that will fall into "supported by a scalar-range special case loop",
so let's not over-generalize the problem too quickly.
- We wanted some kind of "cache mode" for saving and reloading .bc
files transparently inside the compiler, to avoid re-trans'ing
subsets of a crate that the compiler determines as not having
changed run-to-run. So that will probably help set up a framework
for loading and static linking external crates as LLVM bitcode.
I'd probably say go with the macros and the scalar-range loop *first*
while measuring the cost of DLL vs. crate-internal iter vs. scalar-range
loops. If you have a serious performance problem that definitely can't
be solved with scalar-range optimizations, then start digging into the
LLVM-LTO stuff.
The reason is simply this: if "standard wisdom" is to use LLVM-LTO
against the standard library to get "adequate performance", you get two
big follow-on problems:
- The idea that "compilation may be fast" is shot; the standard
approach will involve rebuilding libstd every time. Back in C++.
- You're leaving DLL-based users out in the cold *by design*, rather
than trying to find an approach that gets them decent performance
too.
So I'd like to approach cross crate LLVM-LTO as a last resort, not a
first one.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev