[bitc-dev] The case for typing of purity

Jonathan S. Shapiro Wed, 08 Oct 2008 07:49:25 -0700

This is a longer answer to Swaroop's question:

On Tue, 2008-10-07 at 13:23 -0400, Swaroop Sridhar wrote:
> Effect types are necessary to write correct initializers that involve
> expansive expressions: that is, expressions other than literals,
> functions, and application of constructors and certain primitive
> functions.
> 
> You have talked about this before, but I don't understood the need for
> complex initializers correctly. Here, I mean requirement, rather than
> convenience....
> 
> I ask this because, if complex initializers are a matter of convenience, 
> they might lead to a bigger inconvenience with respect to tracking the 
> effect types.


Swaroop is being slightly imprecise. He writes "effect types", but what
he means is "using effect types to type function purity". Effect types
is a more general concept. Typing purity is one application of effect
types. Typing NOALLOC is another application of effect types. In the
present discussion we are concerned with typing of purity.

What I want to do in THIS note is talk about the merits of typing
purity. I will send a second note talking about effect types more
generally.

Originally, the desire to type purity came about because of expansive
initializing expressions. "Expansive" means anything that calls a
function. Expressions that use only literals and type constructors
(recursively) are not expansive.

Why do we want expressions in intializers?

There are several reasons to want expressions in initializers:

  1. C has them in limited form, and we would like to be able to express
     most C initializers in BitC.

  2. If we disallow expressions, then certain data structures cannot be
     constructed at initialization time. The effect of this is to force
     us to initialize these data structures in a pro-forma way, and then
     do a real initialization at the beginning of main. This requires
     an assignment, which forces us to make the variable mutable.

     It is a goal of BitC to support the programmer in minimizing
     mutability.

  3. Initializers can be run at compile time. This has a curious and
     useful consequence: we can run initializers whose memory bounds
     cannot be established in programs that have hard memory bounds.
     Think of this as a form of compile-time partial evaluation.

There are two problems with expressions in initializers:

  1. They can introduce dependencies on uninitialized values. This can
     happen because of forward references, but we have a rule for how
     to prevent this that can be statically enforced.

  2. If they are allowed to perform side effects, then the dependency
     tracking becomes intractable. Either separate compilation must be
     abandoned or side effects must be prohibited.

  Of these, it is issue [2] that is the hard one.


What are the options?


Option 1: Restricted Expressions

If the goal is merely to match the initializer expressiveness of C, we
can do that by allowing a very limited set of functions to be run. These
are functions known to the compiler, and therefore known to be pure.

This is a "dodge the bullet" approach. It evades the introduction of
effects in the type system by restricting the expressiveness of
initializers. The result is comparable in its expressiveness to the
expressiveness of C, but significantly weaker than the expressiveness of
C++ (which can run general code in constructors).

Option 2: Link-Time Checking

Option 2 is to check for violations at link time. This is possible
because purity can be inferred with no user intervention. In this
option, the compiler would use effect typing internally to infer a
purity effect type for each procedure. It would store the result to the
object file. The linker would then connect the results to determine
whether the program as a whole satisfies the requirement that
initializers must be pure.

  Pro:  This approach eliminates the need to complicate the type syntax
        and the types presented to the user.

  Con:  This approach defers recognition of violation to link time.

  Con:  This approach *may* have the effect of making well-localized
        diagnostics difficult. What we want is something of the form:

          this had to be pure, but f calls g calls h, and the execution
          of h is not pure in this context.

        I *think* that the link-time approach should be able to handle
        this, but I am not entirely certain that this is true.

  Con:  It is very awkward when a language requires a custom linker.
        This significantly impedes successful linkage with other
        languages.

Option 3: Declare Effect Types Only in Interfaces.

It is tempting to think that this might work. Unfortunately it does not.
Consider an initializer expression:

   (define (builder some-list)
     (map transformer-fn some-list))

where /builder/ is an exported identifier and /transformer-fn/ is a
local, private procedure. The purity of /builder/ is expressed by the
constraint:

    purity(builder) >= purity(transformer-fn)

Now if transformer-fn is entirely private, all is well, because we will
be able to fully resolve purity(transformer-fn) and get a concrete
answer.

But suppose that the implementation of transformer-fn is:

  (define (transformer-fn x)
     (+ x 1))
  transformer-fn: (forall ((Arith 'a '%e-arith)) (fn ('a 'a) 'a))

well then we must reveal the chain of constraints:

  purity(builder) >= purity(transformer-fn) >= '%e-arith

and that would be the constraint in the interface where builder is
declared.

But note that if effect types are declared ONLY in interfaces, then we
have no *name* (i.e. no identifier) that binds purity(transformer-fn).
In this case, we need one, because purity(transformer-fn) is not fully
concrete.

This particular example is contrived, because Arith requires member
purity. The point to notice is that using a type class tends to force
effect variables to accumulate and propagate at interfaces. Because we
need to state constraints on these variables, the variables require
names. Because the transitive dependencies must be expressed visibly,
there is no reliable way to express them if the user does not supply the
names for the effect variables. Ultimately *that* is why effect
variables become intrusive.


But there is reason for hope. The accumulation of constraints tends to
arise only in higher-order patterns like map, and most of the core type
classes are explicitly pure. If we knew, for example, that
transformer-fn were pure, then the constraints would all be simplified
away.



What else does purity effects buy us?

Purity is a very useful thing for the compiler to know about. If a
sub-expression is known to be pure, it can be parallelized very easily.
For some applications this is quite important, and this usage extends
the value of purity analysis far past initializers.


My personal opinion:

1. Whether we want to check at compile time or at link time, the
   compiler internally requires support for effect types.

2. Purity effects are important, and I think they are important enough
   to check at compile time rather than link time.

3. It is an intrinsically useful experiment to add purity constraints
   to the syntax and find out whether this special case turns out to
   be more manageable than effect types generally. There are reasons
   to hope that it may be.

shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

[bitc-dev] The case for typing of purity

Reply via email to