Re: [bitc-dev] What is a "Systems Programming Language"

Jonathan S. Shapiro Wed, 31 Jul 2013 10:45:06 -0700

On Wed, Jul 31, 2013 at 9:41 AM, David Jeske <[email protected]> wrote:

> On Wed, Jul 31, 2013 at 8:44 AM, Jonathan S. Shapiro <[email protected]>wrote:
>
>> As to range checks, your point is well taken, but the effects are
>> pernicious when the consequences for inlining are concerned. For
>> CPU-intensive algorithms, it can easily make a 4x to 8x difference in
>> algorithm performance. In particular, it means that loop unrolling isn't
>> very helpful.
>>
>
> For the naive... What are the main issues with range checks, inlining, and
> unrolling? A value-type array can't change size, and can't disappear as
> long as you have a pointer to it.
>

What you say is true. Unfortunately, it isn't a strong enough precondition
to allow range checks to be hoisted safely.

You are correct that the length of a vector cannot change after allocation.
This is guaranteed by the operational definitions of the CLI opcodes. The
problem is that the slot containing the *reference* to the vector is
mutable. In the presence of concurrency, that slot can be overwritten by
thread A while thread B is executing an inner loop. At that point the
entire vector has changed, and of course the length has changed as a
consequence. More immediately, the vector slot can also be overwritten by
code called from the *same* thread of control.

Marking the slot "readonly" doesn't help for two reasons:

1. The concept of a readonly slot isn't visible at the level of the CLI
type system (or at least, I don't remember it appearing there).
2. There is a flaw in the behavioral specification of C# constructors that
renders readonly slots advisory. The optimizer cannot rely on the slots
actually being readonly.

The JIT system has no way to know that a particular object instance is not
referenced by multiple threads. In consequence, it must assume that *any* slot
of an object can be overwritten.

You might think that making a stack-allocated copy of the vector reference
would help, because *that* really *is* a single-thread slot. Except that
one thread can engage stack introspection on another, so actually it isn't
clear whether that helps. And note that if it *does* help, that raises
interesting optimization precondition quandries for termporary introduction
during rewrites.

You might think that a clever JIT system could notice in some cases that
the vector slot is private, not mutated by any method other than the
constructor. Still not good enough if some of the methods are virtual, but
you can fix that by marking the class final.... and now you are looking at
a use case so rare that the risk to the JIT engine (in bugs) to even
consider it is higher than any possible benefit in reality. And of course
once you start inlining things you'll discover that the *calling* procedure
(which is now the one executing the loop) needs the same analysis and
generally doesn't satisfy it.

> If it's a structured increment loop (1 to 10 by n), it seems easy to hoist
> the range check and inline/unroll.
>

I agree that it *seems* easy. Unfortunately, the operative word here is
"seems". One of the hard parts of optimization is making sure that you've
captured all of the preconditions for the optimization to be correct. In
this case, a subtle set of non-local precondition requirements are getting
in the way.

Though thinking about it, I see a tricky problem in that even if you try to
> hoist, if the range-check fails at the hoist, the semantics of typesafe
> loop iteration mean we still have to run the loop until we hit the bounds,
> and we have to throw a specific error for hitting the bounds, so the
> hoisted check didn't really help us.
>

Unless the effects of the loop become unreachable on raise, which is often
true, though in the presence of cross-thread introspection less so than you
might have assumed. And of course, it's very easy to write loops in such a
way that this requirement doesn't hold, because loops are statements rather
than expressions.

> Plus, in non-fixed increment loops (e.g. binary-search), or
> array-index-dereferencing, it seems hard to eliminate the range-check.
>

That's also true, though loops without constant increments are so rare in
practice that an optimizer pass to deal with them probably isn't worthwhile.

> Did I miss anything?
>

Yes. :-) And with high probability, so did I. :-)

Isn't optimization fun?

Jonathan

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] What is a "Systems Programming Language"

Reply via email to