On 07/07/2012 06:18 PM, Roman D. Boiko wrote:
On Saturday, 7 July 2012 at 22:03:20 UTC, Chad J wrote:
enum : SyntaxElement
{
AST_EXPRESSION = 0x0001_0000_0000_0000,
AST_UNARY_EXPR = 0x0000_0001_0000_0000 |
This would cause wasting space (probably a lot). Definitely it would not
be easy to implement in a parser generator, when various language
properties are not known beforehand for fine-grained tuning.
This approach of course has shameful nesting limitations, but I feel
like these determinations could be fairly well optimized even for the
general case. For example: another approach that I might be more
inclined to take is to give each token/symbol a low-valued index into
a small inheritance table.
Depending on implementation, that might introduce the multiplier
overhead of table access per each comparison (and there would be many in
case of searching for nodes of specific type).
I would expect the regex engine to call the isA function as one of
it's operations. Thus placing an AST_EXPRESSION into your expression
would also match an AST_NEGATE_EXPR too.
But actually it is not so difficult to implement in a very similar way
to what you described. I was thinking about a lookup table, but
different from a traditional inheritance table. It would be indexed by
AST node type (integral enum value), and store various classification
information as bits. Maybe this is what you meant and I misunderstood
you... Example is here:
https://github.com/roman-d-boiko/dct/blob/May2012drafts/fe/core.d
(sorry, it doesn't show how to do classification, and has a different
context, but I hope you get the idea). The advantage over storing
hierarchical information directly in each token is obviously memory usage.
I see what you mean.
I'm not sure that I buy that language properties are known before-hand.
The front-end knows what the language grammar looks like and it knows
what kind of things it can find in an AST.
Thus you can make the regex DSL do this transformation and let the DFAs
handle everything:
expr -> (expr | unary_expr | negate_expr | binary_compliment | incr_expr
| decr_expr | binary_expr | assign_expr | add_assign_expr | nary_expr | ...)
Because what we really want it to do is match any of those expression
kinds when we place the expression symbol in our regex.
I think the important point here though is that inheritance can be
reduced to bit-twiddling optimizations in this case.