Re: Let's stop parser Hell

Chad J Sat, 07 Jul 2012 15:05:32 -0700

On 07/07/2012 03:13 PM, Roman D. Boiko wrote:

On Saturday, 7 July 2012 at 18:55:57 UTC, Chad J wrote:

In this vision I do not use classes and inheritance for my AST.
Instead I use structs that contain some kind of nodeType member that
would be one of the tokens/symbols in the grammar, like TOK_WHILE_NODE
in the above code. Dynamic dispatch is instead performed by (very
fast) DFAs recognizing parts of the AST.


Exactly. This idea first came to me in April after I implemented the
first top-down recursive descent custom parser for a D subset. I tried
Visitor pattern before that and wasn't happy. There are some subtle
difficulties which I believe will be possible to overcome, most
important being the need to introduce a mechanism for hierarchical
classification (like a pow expression being an assign expression at the
same time).


From some of my earlier scribblings:

enum : SyntaxElement
{
  AST_EXPRESSION          = 0x0001_0000_0000_0000,
    AST_UNARY_EXPR        = 0x0000_0001_0000_0000 | AST_EXPRESSION,
      AST_NEGATE_EXPR     = 0x0000_0000_0001_0000 |  AST_UNARY_EXPR,
      AST_COMPLIMENT_EXPR = 0x0000_0000_0002_0000 |  AST_UNARY_EXPR,
      AST_POST_ADD_EXPR   = 0x0000_0000_0003_0000 |  AST_UNARY_EXPR,
      AST_POST_SUB_EXPR   = 0x0000_0000_0004_0000 |  AST_UNARY_EXPR,
      AST_PRE_ADD_EXPR    = 0x0000_0000_0005_0000 |  AST_UNARY_EXPR,
      AST_PRE_SUB_EXPR    = 0x0000_0000_0006_0000 |  AST_UNARY_EXPR,
    AST_BINARY_EXPR       = 0x0000_0002_0000_0000 | AST_EXPRESSION,
      AST_AND_EXPR        = 0x0000_0000_0001_0000 |  AST_BINARY_EXPR,
      AST_OR_EXPR         = 0x0000_0000_0002_0000 |  AST_BINARY_EXPR,
      AST_XOR_EXPR        = 0x0000_0000_0003_0000 |  AST_BINARY_EXPR,
      AST_AND_AND_EXPR    = 0x0000_0000_0004_0000 |  AST_BINARY_EXPR,
      AST_OR_OR_EXPR      = 0x0000_0000_0005_0000 |  AST_BINARY_EXPR,
      AST_ADD_EXPR        = 0x0000_0000_0006_0000 |  AST_BINARY_EXPR,
    AST_TRINARY_EXPR      = 0x0000_0003_0000_0000 | AST_EXPRESSION,
    AST_NARY_EXPR         = 0x0000_0004_0000_0000 | AST_EXPRESSION,
  AST_STATEMENT           = 0x0002_0000_0000_0000,
}


bool isA( SyntaxElement leafier, SyntaxElement rootier )
{
        SyntaxElement mask = 0;
        
        if ( rootier & 0x0000_0000_FFFF_FFFF )
        {
                if ( rootier & 0x0000_0000_0000_FFFF )
                        mask = 0xFFFF_FFFF_FFFF_FFFF;
                else
                        mask = 0xFFFF_FFFF_FFFF_0000;
        }
        else
        {
                if ( rootier & 0x0000_FFFF_FFFF_FFFF )
                        mask = 0xFFFF_FFFF_0000_0000;
                else
                        mask = 0xFFFF_0000_0000_0000;
        }
        
        return (leafier & mask) == rootier;
}

unittest
{
        assert(  isA( AST_EXPRESSION,  AST_EXPRESSION) );
        assert(  isA( AST_NEGATE_EXPR, AST_NEGATE_EXPR) );
        assert(  isA( AST_NEGATE_EXPR, AST_EXPRESSION) );
        assert(  isA( AST_NEGATE_EXPR, AST_UNARY_EXPR) );
        assert( !isA( AST_EXPRESSION,  AST_STATEMENT) );
        assert( !isA( AST_NEGATE_EXPR, AST_BINARY_EXPR) );
        assert( !isA( AST_NEGATE_EXPR, AST_STATEMENT) );
        assert( !isA( AST_NEGATE_EXPR, AST_COMPLIMENT_EXPR) );
        assert(false);
}

This approach of course has shameful nesting limitations, but I feellike these determinations could be fairly well optimized even for thegeneral case. For example: another approach that I might be moreinclined to take is to give each token/symbol a low-valued index into asmall inheritance table.

I would expect the regex engine to call the isA function as one of it'soperations. Thus placing an AST_EXPRESSION into your expression wouldalso match an AST_NEGATE_EXPR too.

Re: Let's stop parser Hell

Reply via email to