On Wed, Jul 31, 2013 at 9:41 AM, David Jeske <[email protected]> wrote:
> On Wed, Jul 31, 2013 at 8:44 AM, Jonathan S. Shapiro <[email protected]>wrote: > >> As to range checks, your point is well taken, but the effects are >> pernicious when the consequences for inlining are concerned. For >> CPU-intensive algorithms, it can easily make a 4x to 8x difference in >> algorithm performance. In particular, it means that loop unrolling isn't >> very helpful. >> > > For the naive... What are the main issues with range checks, inlining, and > unrolling? A value-type array can't change size, and can't disappear as > long as you have a pointer to it. > What you say is true. Unfortunately, it isn't a strong enough precondition to allow range checks to be hoisted safely. You are correct that the length of a vector cannot change after allocation. This is guaranteed by the operational definitions of the CLI opcodes. The problem is that the slot containing the *reference* to the vector is mutable. In the presence of concurrency, that slot can be overwritten by thread A while thread B is executing an inner loop. At that point the entire vector has changed, and of course the length has changed as a consequence. More immediately, the vector slot can also be overwritten by code called from the *same* thread of control. Marking the slot "readonly" doesn't help for two reasons: 1. The concept of a readonly slot isn't visible at the level of the CLI type system (or at least, I don't remember it appearing there). 2. There is a flaw in the behavioral specification of C# constructors that renders readonly slots advisory. The optimizer cannot rely on the slots actually being readonly. The JIT system has no way to know that a particular object instance is not referenced by multiple threads. In consequence, it must assume that *any* slot of an object can be overwritten. You might think that making a stack-allocated copy of the vector reference would help, because *that* really *is* a single-thread slot. Except that one thread can engage stack introspection on another, so actually it isn't clear whether that helps. And note that if it *does* help, that raises interesting optimization precondition quandries for termporary introduction during rewrites. You might think that a clever JIT system could notice in some cases that the vector slot is private, not mutated by any method other than the constructor. Still not good enough if some of the methods are virtual, but you can fix that by marking the class final.... and now you are looking at a use case so rare that the risk to the JIT engine (in bugs) to even consider it is higher than any possible benefit in reality. And of course once you start inlining things you'll discover that the *calling* procedure (which is now the one executing the loop) needs the same analysis and generally doesn't satisfy it. > If it's a structured increment loop (1 to 10 by n), it seems easy to hoist > the range check and inline/unroll. > I agree that it *seems* easy. Unfortunately, the operative word here is "seems". One of the hard parts of optimization is making sure that you've captured all of the preconditions for the optimization to be correct. In this case, a subtle set of non-local precondition requirements are getting in the way. Though thinking about it, I see a tricky problem in that even if you try to > hoist, if the range-check fails at the hoist, the semantics of typesafe > loop iteration mean we still have to run the loop until we hit the bounds, > and we have to throw a specific error for hitting the bounds, so the > hoisted check didn't really help us. > Unless the effects of the loop become unreachable on raise, which is often true, though in the presence of cross-thread introspection less so than you might have assumed. And of course, it's very easy to write loops in such a way that this requirement doesn't hold, because loops are statements rather than expressions. > Plus, in non-fixed increment loops (e.g. binary-search), or > array-index-dereferencing, it seems hard to eliminate the range-check. > That's also true, though loops without constant increments are so rare in practice that an optimizer pass to deal with them probably isn't worthwhile. > Did I miss anything? > Yes. :-) And with high probability, so did I. :-) Isn't optimization fun? Jonathan
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
