Parsing with tools.rd: idc.pad

downs Wed, 16 Sep 2009 03:55:13 -0700

Justin Johansson wrote:
>downs Wrote:
>>
>> Justin Johansson wrote:
>>> Can D people please recommend suitable tools for generating a parser (in D) 
>>> for an LL(1) grammar.  There's bound to be much better parser generator 
>>> tools available nowadays, since my last foray into this area 10+ years ago 
>>> with YACC.  I've heard of tools like bison, SableCC etc but apart from the 
>>> names know nothing about them.
>>>
>>> (Note.  This question is not about writing a parser for D.  It is about 
>>> writing a parser in D for another language which has an LL(1) grammar).
>>>
>>> Thanks in advance for all help.
>>>
>>> -- Justin Johansson
>>>
>> In a completely different vein, tools.rd is a simplicistic recursive descent 
>> parser framework implemented at compiletime that I've used for most/all of 
>> my toy languages. It keeps things trivial - there's no lexing stage, it 
>> parses straight from input string. It's not that well documented, but if you 
>> want, give me a simple language description and I can write you a sample 
>> parser. It's probably the easiest to use though - just mix it in from D code 
>>   :)
>
> Hi downs,
>
> Thanks for the offer but since YACC is my prior background I'll probably go 
> to the closest tool which is the modern variant for LL(1).  Still if you have 
> a small sample to share I'm sure other D people will be delighted.
>
> <JJ/>
>


Well for instance, take the PAD (Pastebin Adventure) component of my IRC bot, 
that can run simple text adventures from a variety of sources, like local Gobby 
sessions, Wikis and (originally) Pastebin.com:

http://dsource.org/projects/scrapple/browser/trunk/idc/pad

Let's look at 
http://dsource.org/projects/scrapple/browser/trunk/idc/pad/engine.d

L175: gotToken

Functions like this form the building blocks of tools.rd parsing. They always 
have the form "bool gotBlarghle(ref string st, out T result)" and return true 
if result could be parsed from st, otherwise false (in which case st is not 
modified).

gotToken trivially removes a token from the input text.

L200: bool accept(ref string st, string cmp): This function is called 
internally by the parser framework to decide if st starts with a comparison 
string, in which case it is removed and true returned. bool accept removes 
tokens from both strings and compares until a comparison fails (false, st not 
modified) or cmp is used up (true).

L230: The first use of the actual Parser DSL.

    return mixin(gotMatchExpr("s: log"));

This simply matches "log" against the input string s. Nothing fancy.

L282: Not related to the parser but still, IMHO, insanely cool.
    const string Table = `
              | bool          | int         | string               | float
      --------+---------------+-------------+----------------------+--------
      Boolean | b             | b           | b?q{true}p:q{false}p | ø
      Integer | i != 0        | i           | Format(i)            | i
      String  | s == q{true}p | atoi(s)     | s                    | atof(s)
      Float   | ø             | cast(int) f | Format(f)            | f`;

This table contains a conversion matrix for internal types to basic type. Two 
things are of interest:

1) q{}p is unrolled by .litstring_expand() into nested and escaped ""s. It's a 
backport of D2 nestable string literals to D1.

2) The table itself. tools.ctfe contains functionality to select rows, columns, 
and iterate the table in column-major order. This means the above table can be 
automatically translated into nested if/switch statements.

L487: A more instructive use of the parser framework.

  if (mixin(gotMatchExpr("st: 
[==$#eq=true$|!=$#neq=true$|<=$#eq=smaller=true$|>=$#eq=greater=true$|<$#smaller=true$|>$#greater=true$]
 "
    "$dg2 <- genExprMath$"
  ))) { ... }

Okay, first we have a conditional branch: [a|b|c|d]. This matches each of the 
possible branches against the input string in turn. Segments in $$ indicate 
variable matches and/or programmatic reactions. $#eq=smaller=true$ basically 
translates to "execute eq=smaller=true when this part of the parse string is 
successfully reached. ".

"$dg2 <- genExprMath$" means "Generate dg2 using the genExprMath function" It 
is assumed that this function follows the convention of bool(ref string, out 
typeof(dg2)).

It hasn't been used in that sample, but "y <- foo/x" means "pass x as an extra 
parameter to foo". And that's basically it.   :)

Oh, just for fun, here's the unrolled D syntax for the above expression:

(ref string s) {
  auto scratch = s;
  return (
    true && (ref string s) {
      auto scratch = s;
      return                (true && scratch.accept("==") && (((eq=true), 
true))) && ((s=scratch), true)
          || (((scratch=s), true) && scratch.accept("!=") && (((neq=true), 
true))) && ((s=scratch), true)
          || (((scratch=s), true) && scratch.accept("<=") && 
(((eq=smaller=true), true))) && ((s=scratch), true)
          || (((scratch=s), true) && scratch.accept(">=") && 
(((eq=greater=true), true))) && ((s=scratch), true)
          || (((scratch=s), true) && scratch.accept("<") && (((smaller=true), 
true))) && ((s=scratch), true)
          || (((scratch=s), true) && scratch.accept(">") && (((greater=true), 
true))) && ((s = scratch), true);
    }(scratch) && ( genExprMath(scratch, dg2 ))
  ) && ((s = scratch), true);
}(st)

Parsing with tools.rd: idc.pad

Reply via email to