Re: std.d.lexer: pre-voting review / discussion

H. S. Teoh Thu, 12 Sep 2013 10:36:42 -0700

On Thu, Sep 12, 2013 at 06:09:41PM +0200, deadalnix wrote:
> On Thursday, 12 September 2013 at 14:09:43 UTC, H. S. Teoh wrote:
> >This can be handled by using an intermediate grammar rule. Reduce all
> >(...) into an intermediate type, say ArgList, so the reduction
> >happens something like this:
> >
> >     int  foo   ()      ()      {}
> >     Type Ident ArgList ArgList ^
> >
> >Then have the rule:
> >
> >     CompileTimeArgs ::= ArgList
> >     RuntimeArgs ::= ArgList
> >     TemplateDecl ::= Type Ident CompileTimeArgs RuntimeArgs '{' ...
> >     FuncDecl ::= Type Ident RuntimeArgs '{' ...
> >
> >So first, all (...) gets parsed to ArgList, but it's not yet fixed
> >whether they are compile-time arguments or runtime arguments. It's
> >only after you see the next '(' or '{' that you decide whether
> >ArgList should reduce to CompileTimeArgs or RuntimeArgs.
> >
> >ArgList itself, of course, will accept all possible parameters (both
> >runtime and compile-time): types, expressions, symbols. Then when you
> >reduce it to RuntimeArgs, you reject stuff that can't be interpreted
> >as parameter declarations.
> >
> 
> And then you got to backtrack the parsing instead of the lexing. You
> just moved the problem around. You'll have to create some temporary
> ast nodes that then will fix into what they really are.


No. You can just use ArgListItem for both runtime args and compile-time
args. Once you decided which one it is, wrong arguments are rejected at
semantic time (which you have to do anyway).

Let's take a concrete example. Say we're parsing this invalid code:

        int foo(alias A)(alias B) {}

You'd go through these steps:

1) Parse initial prefix of declaration:

        int foo(alias A)(alias B) {}
               ^
        AST:
        FuncDecl
         |--RetType: int
         |--Ident: foo
         \--[ being built ]

2) Parse first (...):

        int foo(alias A)(alias B) {}
                        ^
        AST:
        FuncDecl
         |--RetType: int
         |--Ident: foo
         |--ArgList
         |   \-- AliasArg
         |        \-- ident: A
         \--[ being built ]

   I'm skipping the intermediate steps here, it's obvious how to
   construct AliasArg from the usual parsing process.

3) Parse second (...):

        int foo(alias A)(alias B) {}
                                  ^
        AST:
        FuncDecl
         |--RetType: int
         |--Ident: foo
         |--ArgList
         |   \-- AliasArg
         |        \-- ident: A
         |--ArgList
         |   \-- AliasArg
         |        \-- ident: B
         \--[ being built ]

4) At this point, you now know the first ArgList is CompileTimeArgList,
and the second is RuntimeArgList, so you can just change the type
fields (along with narrowing FuncDecl to TemplateFuncDecl):

        AST:
        TemplateFuncDecl (was: FuncDecl)
         |--RetType: int
         |--Ident: foo
         |--CompileTimeArgList (was: ArgList)
         |   \-- AliasArg
         |        \-- ident: A
         |--RuntimeArgList (was: ArgList)
         |   \-- AliasArg
         |        \-- ident: B
         \--[ being built ]

Since you're still constructing FuncDecl, your current parsing context
should still have a direct reference to the partially-constructed
FuncDecl node, which in turn has a direct reference to both ArgList
child nodes. So this is just dereferencing a couple of pointers. No
backtracking.

5) Finish parsing the declaration:

        int foo(alias A)(alias B) {}
                                    ^
        AST:
        TemplateFuncDecl
         |--RetType: int
         |--Ident: foo
         |--CompileTimeArgList (was: ArgList)
         |   \-- AliasArg
         |        \-- ident: A
         |--RuntimeArgList (was: ArgList)
         |   \-- AliasArg
         |        \-- ident: B
         \--FuncBody
             \-- CompoundStatement
                  \-- [empty body]

6) Run semantic:
   - Create local symbol table for foo, etc..
   - Run semantic on CompileTimeArgList:
      - Check AliasArg for validity
      - Run semantic on AliasArg: add A to function's local symbol
        table, etc.
   - Run semantic on RuntimeArgList:
      - Check AliasArg for validity: ERROR: cannot have alias parameter
        at runtime.
      - (Add B to local symbol table)(skipped due to previous error)
   - (Run semantic on FuncBody)(skipped due to previous error)
   - (Run semantic on RetType (verify return type match, etc.))(skipped
     due to previous error)
   - (Add function to parent scope symbol table)(skipped due to previous
     error)

So, no backtracking is necessary.

Of course, it sounds like DMD's parser doesn't work this way, but that's
a limitation of DMD's parser, not an *inherent* need for backtracking.


T

-- 
I see that you JS got Bach.

Re: std.d.lexer: pre-voting review / discussion

Reply via email to