[Felix-language] new grammar features

skaller Sun, 24 Jun 2007 07:24:43 -0700

The new dypgen based syntax extension mechnism has its first
new feature .. :) It seems to work on once case:


/// fragment of lib/nugram.flx //////
  satom := match sexpr with smatching+ endmatch =># 
    "`(ast_match (,_2 ,_4))";

    /*
    smatchings := smatching =># "`(,_1)";
    smatchings := smatching smatchings =># "(cons _1 _2)";
    */

    smatching  := | spattern => sexpr =># "`(,_2 ,_4)";
    smatching  := | => sexpr =># "`(pat_none ,_4)";
////////////////////////////////////////

You can now write

        nt* nt+ nt?

for a possibly empty sequence of nt, a non-empty sequence
of nt, or an optional nt in the grammar production as shown.
In the example smatching+ is a replacement for the
nonterminal smatchings which is commented out.

This works by defining extra rules:

    smatching__nelist := smatching =># "`(,_1)";
    smatching__nelist := smatching smatching__nelist =># "(cons _1 _2)";


The extensions __list, __nelist, __opt  are used for possibly
empty list, non-empty list, and optional. Optional construction
for x returns

        none
        (some x)

for the missing and non-missing cases respectively.
Both lists return an ordinary scheme list. An empty list
is just '() in Scheme.

A small change has also been made to the way rules are stored,
so that the set of stored rules contains no duplicates.
This was done in the hope that if you write, for example

        fred+

in two places, the duplicate rules generated will be eliminated.

You should note that in the production:

  satom := match sexpr with smatching+ endmatch =># 
    "`(ast_match (,_2 ,_4))";

the + is not counted for purposes of numbering the
non-terminal attributes. This is because

        smatching+

is replaced by

        smatching__nelist

which is only a single non-terminal.

At present +,* and ? must follow a single non-terminal.
It is not possible to apply them to a sequence. For example
you cannot write:

        x := (a b)* c =># ....

because at present this would apply * to the ) symbol.
That isn't allowed, since ) isn't a non-terminal,
and ( .. ) currently means to parse a ( then to parse ..
then to parse a ), i.e. the ( and ) are just ordinary
tokens, not groupings. This may change, see below.

Because you can't write * + and ? to mean themselves,
the substitute non-termianls

        star // for *
        plus // for +
        quest // for ?

have been introduced into the initial grammar,
and must be used for those tokens.

FUTURE EXTENSIONS -- MORE TEMPLATES
------------------------------------

I hope the * + ? feature will be useful immediately, but
it is hard coded. We can view these features as 'templates'
or 'syntactic sugar', which are replaced by new sequence
of tokens PLUS some new rules are defined.

A generalisation of this would allow more or less arbitrary 
such sugar to be defined by the user. For example some
kind of way to write:

        nonterminal + =># .. code to generate extension here ..

It's almost painfully obvious how to do this... USE DYPGEN!
Specifically, we have:

statement:
  ...
  | SYNTAX NAME LBRACE dyprods RBRACE

 dyprod:
   | NAME COLONEQUAL rhs PARSE_ACTION statement
   | NAME COLONEQUAL rhs PARSE_ACTION STRING SEMI

....

where rhs is any sequence of tokens other than PARSE_ACTION (=>#).

So of course, dyprod, rhs, etc .. are *already* extensible with
new productions, it remains to make it possible to specify
user actions which somehow implement the templates.

FUTURE EXTENSIONS -- SCHEME ENVIRONMENT
----------------------------------------

At present, Felix has 3 separate extension mechanism.

First, the old preprocessor based extension mechanism
uses a fixed grammar plus a lookup table which enables
user brackets and infix operators to map to a function
application, and allows statements to be parsed by a recursive
descent interpretive parser, with user actions represented
by existing syntax with special identifiers _1 _2 etc
representing holes.

The parser actually parses the user actions to generate
an 'almost' ordinary AST term, and records the AST terms
the symbols in the production match. The macro processor
then plugs these argument terms into the _1 _2 parameters
to produce a desugared term which is then processed
normally (as if the parser had generated it directly).

This mechanism requires a lot of support and is restricted
in the kinds of terms it can handle. It also doesn't scope,
since the new grammar is extended by the pre-processor.

The second mechanism implemented used Dypgen to do a
similar job in a similar way. With dypgen, the extensions
are specified with ordinary Felix statements, not the 
preprocessor, and scope properly. The extensions are 
packaged up into a Domain Specific Sub-Language definition,
for example:

//////////////////////////////////
#keyword publish
syntax syn_document {
  statement := publish 
    string_literal statement =># _3
  ;
}
//////////////////////////////////

introduces a new statement in which any statement
can be prefixed with a descriptive comment. The
dssl 'syn_document' can then be enabled by saying:

open syntax syn_document;

This grabs the stored rules and sends them to Dypgen,
which builds a new automaton which includes those rules.
The automaton is only in scope within the production which
contains the open directive, and doesn't affect any other
parallel GLR parse attempts. The effect is to make the
extension available from the point the directive is issued
up to the end of the current module, function body, or
other places where statements are parsed.

This mechanism is much better than the pre-processor hackery,
because it is more modular (extensions are grouped into 
DSSLs), properly scoped, and it integrates seamlessly
with the existing grammar and parser technology,
instead of using an LR/recursive descent hybrid.
In particular, there's no longer any need to trigger
parsing of new statements with specific keywords.

Still, this system suffers from the restriction that
it can only handle non-terminals for which it has
been specifically prepared: whilst expressions and
statements are supported, there was no support for
extending patterns, or any other sub-part of a term,
such as the 'adjectives' that can be applied to
functions (inline, noinline, virtual, etc).

The third mechanism eliminates this problem by defining
all user actions as Scheme programs which are executed
to return an s-expression, that is, a Scheme term.

These scheme term are plugged together using Scheme
itself, and the final scheme AST is converted to an
intermediate S-expression type, and then to the original
kind of Felix AST terms.

Since Scheme is dynamically typed, there are no restrictions
on what kinds of Felix AST terms can be built, or how,
provided the scheme -> sex -> felix mapping can recognise
the terms produced. That reduces the workload for handling
arbitrary Felix terms to a single translation function.

Now, having explained all this .. at present each Scheme
term is represented as a string. When the corresponding 
nonterminal is reduced, the string is evaluated as Scheme
in an environment enriched by variables named _1 _2 .. etc,
and defined as the scheme terms which are the attributes
of the non-terminals matched.

However, any utility functions, etc, have to be defined
in EVERY user action that needs them, because a new
environment is constructed for every action. This ensures
the evaluations of each action don't interfere with each
other. The following production:

///////////////////////////////////
    selse_part:= selifs else sexpr =># """
     (letrec 
       ((fold_left 
         (lambda (f acc lst) 
           (if (null? lst) acc (fold_left f (f acc (car lst)) (cdr
lst))))))
       (let ((f (lambda (result condthn) 
         (let ((cond (car condthn)) (thn (cadr condthn))) 
           `(ast_cond (,cond ,thn ,result))))))
         (fold_left f _3 _1)))
//////////////////////////////////

defines 'fold_left' in scheme, then applies it to construct
a new term. It would be useful if the definition of fold_left
could be given just once, and made available to all the user
actions.

So this very long winded explanation is presented to suggest
that we need to provide a way to extend the user action code
too, not just the grammar. The obvious way to start is the
conventional way for a programming language: an environment
in which subroutines can be defined which can be subsequently
shared by user actions.

Scheme itself has a reputation for its ability to extend itself
and includes a macro facility, so quite a bit of interesting
stuff can probably be done using Scheme itself, if the right
basic extension interfaces are provided.

If I can digress a bit, applying the Felix extension idea
to C++ we have a need to keep track of which names are
typedef names, to make it possible to parse it, and a mutable
store would be useful for that.

No doubt more useful features will become evident as the
experiment progresses.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Felix-language mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] new grammar features

Reply via email to