Properties, opIndex, and expression rewriting.

Chad J Tue, 11 Aug 2009 00:20:14 -0700

http://www.igsoft.net/dpolls/poll/results.php?pollid=1
* 111 votes total.
* Keep things as they are now: 1 vote.
* And where did that vote come from?
Jeremie Pelletier pisze:
>
> Oops, I got the only one "keep things as they are now" vote haha.
> I meant to vote for the one with resolved +=.
>


So, Walter, how about some expression rewriting logic to solve some of
the problems with properties, opIndex, and related lvalueness stuff?

(I refer to the expression rewriting in the Wiki4D article on
properties:
http://prowiki.org/wiki4d/wiki.cgi?DocComments/Property#Semantic)

Syntax can be broached later.

I've also attached some D-like pseudocode of what I think such an
expression rewriting algorithm would look like (propertyrewrite.d).  The
other file is a potential test-case.  I used it to figure out what a
trace of the algorithm might look like, and designed it accordingly.  If
it can wait a few weeks, I might try to put it into dmd and write a
patch.  I'll only do it if there's interest though.

Other notes about attached pseudocode:
* The types in this psuedocode are expected to be defined as D classes
are; that is, they are reference types.
* It seems worthwhile to call this after operator overloads are turned
into calls.  This would make it more clear which things potentially have
side effects.  (Just imagine how << is overloaded in C++.  As much as
it's frowned upon, something similar could happen in the future of D.
Marking overloads like '<<', '+', '-' as pure could potentially remove
any pessimisation.)

// This function will take the expression "expr" and expand it into multiple
//   statements such that no single statement contains more than one side-
//   effect.  It will also scan for property getter invocations and opIndex
//   that are treated as lvalues.  When it finds them, it will rewrite them
//   such that the corresponding setter or opIndexAssign is guaranteed to be
//   called.
//
// To provide some insight into how prolog and epilog work, here is what they
//   look like at the outermost level:
// [prolog]
// auto t0 = a;
// [prolog and epilog grow towards the center]
// a++;
// a = t0;
// [epilog]
// (That would be made from "a++;" if 'a' were a property.)
//
// Assumptions:
// * expr's structure is consistent with operator precedence.
// * Operator overloads have already been expanded into function calls.
//
Expression doPropertyRewriteGreedily(
    Expression expr,
    BlockStatement prolog,
    BlockStatement epilog,
    bool hasSideEffects // Does the parent expression have side-effects?
    )
{
    VariableExpression temporary = expr;
    
    if ( expr == opPostInc || expr == opPostDec )
    {
        // Generate any temporaries that we depend on but don't know about.
        // Also, find out what the child is made of.
        auto incoming = 
doPropertyRewriteGreedily(expr.child,prolog,epilog,true);

        if ( incoming.isVariableExpression() )
        {
            // The rewrite gave us a variable or a temporary, so making
            //   yet another temporary is unnecessary.
            temporary = incoming;
        }
        else
        {
            // Make a new temporary to hold the returned expression.
            temporary = new VariableExpression(makeUniqueTemporaryName());
            
            // Check for duplicate getter calls before injecting ours.
            if ( ! prolog.contains("auto {anything} = {incoming};");
                prolog.append("auto {temporary} = {incoming};");
        }

        // Check for duplicate setter calls before injecting ours.
        if ( ! epilog.contains("{incoming} = {anything};") )
            epilog.prepend("{incoming} = {temporary};");

        // This looks out-of-order, but since we are prepending,
        //   this will actually be executed before the setter is called.
        expr.child = temporary;
        epilog.prepend("{expr};");
    }
    else if (  expr == oneOf("~,!,-,+=,-=,*=,/=,%=,|=,&=,~=,>>=,>>>=,<<=")
            || expr == impureFunctionCall )
    {
        // lhs = left-hand-side. rhs = right-hand-side.
        // For the unary ~, !, and -, the lhs is the argument
        //   and the rhs has zero expressions.
        auto lhs = expr.lhs;

        // Generate any temporaries that we depend on but don't know about.
        // Also, find out what the left-hand-side is made of.
        auto incoming = doPropertyRewriteGreedily(lhs,prolog,epilog,true);

        if ( incoming.isVariableExpression() )
        {
            // The rewrite gave us a variable or a temporary, so making
            //   yet another temporary is unnecessary.
            temporary = incoming;
        }
        else
        {
            // Make a new temporary to hold the returned expression.
            temporary = new VariableExpression(makeUniqueTemporaryName());

            // Check for duplicate getter calls before injecting ours.
            if ( ! prolog.contains("auto {anything} = {incoming};" )
                prolog.append("auto {temporary} = {incoming};");
        }

        // Substitute arguments with temporaries as needed.
        // These particular temporaries will be created by the recursive calls
        //   to doPropertyRewriteGreedily, so we don't need to set them up.
        foreach( ref e; rhs )
            e = doPropertyRewriteGreedily(e,prolog,epilog,false);

        // Replace expr's guys and stick it here.
        expr.lhs = temporary;
        prolog.append("{expr}");

        // Check for duplicate setter calls before injecting ours.
        if ( ! epilog.contains("{incoming} = {anything};") )
            epilog.prepend("{incoming} = {temporary};");
    }
    else if ( expr == "=" && (expr.lhs == writeProperty || expr.lhs == 
opIndexAssignCall) )
    {
        // ...
        // optionally rewrite expr.lhs as a function call.
        // ...

        // Rewrite the subexpressions as needed.
        expr.lhs = doPropertyRewriteGreedily( expr.lhs, prolog, epilog, true );
        expr.rhs = doPropertyRewriteGreedily( expr.rhs, prolog, epilog, false );
    }
    // This assumes that an expression like a.b.c is considered a readProperty 
if c is a property.
    // In other words, the recursive call /must/ include all of a.b.c, not just 
c itself.
    // (such troubles are possible if c is a leaf of a dot operator or 
somesuch.)
    else if ( hasSideEffects && (expr == readProperty || expr == opIndexCall) )
    {
        // Generate any temporaries that we depend on but don't know about.
        // Given a.b.c, expr.lhs is a.b
        // Given s[i], expr.lhs is s
        expr.lhs = doPropertyRewriteGreedily(expr.lhs,prolog,epilog,true);

        if ( expr == opIndexCall )
            // Given s[i++], expr.rhs is i++
            expr.rhs = doPropertyRewriteGreedily(expr.rhs,prolog,epilog,true);

        // ...
        // optionally rewrite expr (the property/indexOp) as a function call.
        // ...

        temporary = new VariableExpression(makeUniqueTemporaryName());

        // Check for duplicate getter calls before injecting ours.
        if ( ! prolog.contains("auto {anything} = {expr};" )
            prolog.append("auto {temporary} = {expr};");

        // Check for duplicate setter calls before injecting ours.
        if ( ! epilog.contains("{incoming} = {anything};") )
            epilog.prepend("{incoming} = {temporary};");
    }
    else if ( expr != null )
    // Other binary operators, ?: operator, pure functions, etc...
    {
        // Recursion terminates when expr.children is empty.
        foreach( ref e; expr.children )
            e = doPropertyRewriteGreedily( e, prolog, epilog, hasSideEffects );
    }

    return temporary;
}

// A wrapper around the greedy rewrite.
// This does the same thing as the greedy counterpart, but only when a property
//   getter invocation is found in a subexpression of a side-effect.  Otherwise
//   it does not expand the expression into multiple statements, and the
//   returned expression may contain more than one side-effect.
// This ensures that if a rewrite doesn't need to happen, then it won't happen.
// The greedy version may be called directly and both results will be correct,
//   but the lazy version's result might be faster executing.
//
// Assumptions:
// * expr's structure is consistent with operator precedence.
// * Operator overloads have already been expanded into function calls.
//
Expression doPropertyRewriteLazily(
    Expression expr,
    ref Expression farthestAncestorSideEffect, // This must be a reference to a 
reference.
    BlockStatement prolog,
    BlockStatement epilog
    )
{
    if (   expr == oneOf("++,--,~,!,-,+=,-=,*=,/=,%=,|=,&=,~=,>>=,>>>=,<<=,=")
        || expr == impureFunctionCall )
    // if ( expr.hasSideEffects() )
    {
        foreach( ref e; expr.children )
            e = doPropertyRewriteLazily( e, expr, prolog, epilog );
    }
    else if ( farthestAncestorSideEffect != null &&
        (expr == readProperty || expr == opIndexCall) )
    {
        farthestAncestorSideEffect =
            doPropertyRewriteGreedily( farthestAncestorSideEffect, prolog, 
epilog, false );
    }
    else if ( expr != null )
    {
        foreach( ref e; expr.children )
            e = doPropertyRewriteLazily( e, farthestAncestorSideEffect, prolog, 
epilog );
    }

    return expr;
}

// This is a wrapper for doPropertyRewriteLazily or doPropertyRewriteGreedy.
// See one of those for more details.
//
// Assumptions:
// * exprStatement is a child of parent.
// * exprStatement's expression structure is consistent with operator 
precedence.
// * Operator overloads have already been expanded into function calls.
//
void doPropertyRewrite( BlockStatement parent, ExpressionStatement 
exprStatement )
{
    auto prolog = new BlockStatement();
    auto epilog = new BlockStatement();

    // The greedy version can be called if the lazy one doesn't work right.
    exprStatement.expression = 
doPropertyRewriteLazily(exprStatement.expression, null, prolog, epilog);

    prolog.append(exprStatement);
    prolog.append(epilog);

    // replace the expression statement with its expansion.
    parent.replace(exprStatement, prolog);
}

// By turning all of the member variables into properties and tracing their
//   execution, this program could be turned into a test case for expression
//   rewriting.  
// Originally compiled with dmd 2.031 on Gentoo Linux.
import std.stdio;

class Foo
{
    Foo b;
    int c;
    Foo d;
    int e;
    int q;

    Foo f;

    Foo opIndex( int val )
    {
        writefln("Foo[%d]",val);
        if ( f is null )
            f = new Foo();
        return f;
    }

    Foo opIndexAssign( Foo foo, int val )
    {
        writefln("Foo[%d] = foo",val);
        return f = foo;
    }
}

void main()
{
    Foo a = new Foo();
    a.b = new Foo();
    a.d = new Foo();
    Foo p = new Foo();

    void reset()
    {
        a.b.e = 42;
        a.b.c = 10;
        a.d.c = 8;
        p.q = 4;
    }

    void printAll()
    {
        writefln("a.b.c = %d", a.b.c);
        writefln("a.b.e = %d", a.b.e);
        writefln("a.d.c = %d", a.d.c);
        writefln("p.q   = %d", p.q);
    }

    reset();
    writefln("initial state:");
    printAll();

    writefln("");
    writefln("monstrosity 1 results:");
    a.d[a.b.c++].c *= (p.q++) * (a.b.e = 3);
    printAll();

    reset();
    writefln("");
    writefln("monstrosity 2 results:");
    // The code below includes a call trace of what happens when
    // a.d[a.b.c++].c *= (p.q++) * (a.b.e = 3); // everything is a property.
    // is rewritten as:
    auto t1 = a;      // rewrite(a.b).top.add(...)
    auto t3 = t1.b;   // rewrite(a.b.c).top.add(...)
    auto t4 = t3.c;   // rewrite(a.b.c++).top.add(...)

    // auto t2 = a;   // rewrite(a.d).top.add(...) but not done because "auto 
t1 = a;" already exists.
    auto t2 = t1.d;   // rewrite(a.d[...]).top.add(...)
    auto t5 = t2[t4]; // rewrite(a.d[a.b.c++].c).top.add(...)

    // (p.q++)
    auto t7 = p;    // rewrite(p.q).top.add(...)
    auto t8 = t7.q; // rewrite(p.q++).top.add(...)

    // (a.b.e = 3)
    //auto t9 = a;  // rewrite(a.b).top.add(...) but not done because "auto t1 
= a;" already exists.
    auto t9 = t1.b; // rewrite(a.b.e = 3).top.add(...)

    // Do not make a new temporary for .c;
    // Explicit assignments call only write properties.  They do not read.
    auto t6 = t5.c;        // rewrite(root).top.add(...)
    t6 *= t8 * (t9.e = 3); // rewrite(root).top.add(...);

    // a.d[a.b.c++].c *= ...
    t5.c = t6; // rewrite(root).bottom.prepend(...)

    // (a.b.e = 3)
    // t1.b = t9; // rewrite(a.b.e = 3).bottom.prepend(...) but not done 
because "t1.b = t3;" already exists.
    // a = t1;    // rewrite(a.b).bottom.prepend(...) but not done because "a = 
t1;" already exists.

    // The opPostInc expressions must be done last.
    // (p.q++)
    t8++;      // rewrite(p.q++).bottom.prepend(...)
    t7.q = t8; // rewrite(p.q++).bottom.prepend(...)
    p = t7;    // rewrite(p.q).bottom.prepend(...)

    // Below is not an opPostInc, but is executed before the opPostInc 
contained inside of it.
    // This is unusual, but should be compliant with the D spec.
    // It will only be noticable if the order in which property getter/setter 
calls are made
    //   depends on the order of execution of unrelated/sibling expressions.  
That would be bad.
    // a.d[...]
    t2[t4] = t5; // rewrite(a.d[a.b.c++].c).bottom.prepend(...)
    t1.d = t2;   // rewrite(a.d[...]).bottom.prepend(...)
    // a = t1;   // rewrite(a.d).bottom.prepend(...) but not done because "a = 
t1;" already exists.

    // (a.b.c++)
    t4++;      // rewrite(a.b.c++).bottom.prepend(...)
    t3.c = t4; // rewrite(a.b.c++).bottom.prepend(...)
    t1.b = t3; // rewrite(a.b.c).bottom.prepend(...)
    a = t1;    // rewrite(a.b).bottom.prepend(...)

    // END
    printAll();
}

/* This program should print the following:
initial state:
a.b.c = 10
a.b.e = 42
a.d.c = 8
p.q   = 4

monstrosity 1 results:
Foo[10]
a.b.c = 11
a.b.e = 3
a.d.c = 8
p.q   = 5

monstrosity 2 results:
Foo[10]
Foo[10] = foo
a.b.c = 11
a.b.e = 3
a.d.c = 8
p.q   = 5
*/

Properties, opIndex, and expression rewriting.

Reply via email to