This is a brief note in which I talk to myself and any lurkers out there
about user defined literals.

Why?
====

Most languages are polluted by fixed grammars. In the history of 
programming languages, variables were introduced to allow for,
well, variable values. And of course, we had user defined functions.

Later, we got user defined types, which greatly improve the ability
of a language to express things.

In some languages, we got overloading, to allow a nice notation
fixed in the language to be re-used for user defined types.
A few languages also provide a lame way to add new operators.
Some languages provided lame ways to add new statements,
such as by using MACROs in C. Better macro technology had to wait
in Scheme for very recent times.

Felix is one of the first production languages to allow user defined grammar.
This subsumes mere user defined operators with rigid precedences:
within certain restrictions regarding not disturbing the base grammar,
new constructions can be added relatively easily. In combination with
additions to library code and the Dypgen GLR parsing capabilities,
together with action codes written in Scheme, a wide class of extensions
to the grammar can be made. Of course the semantics remain fixed by
what the compiler can do.

This dynamic grammar definition capability is so good, almost the whole
of Felix is defined in user space. All we need is a hard coded bootstrap
language in which to define the new grammar, and a couple of 
pre-defined terms.

The pre-defined terms in Felix include the structure of identifiers
and literals for numbers and strings.

Now, if we were to desire a fully general parsing system, we need
to lift these terms out of the hard coded grammar. At the same time,
doing so has the advantage of allowing NEW literals and special
terms to be added to the language. 

Thus, generalisation should allow separating the parsing technology
from Felix, to create a new product, whilst at the same time making
Felix itself more flexible.

Regexp lexemes
=============

Since Dypgen has user defined regexps, we can use this feature
to lex literals and identifiers. [The last commit is well on the way
to completing this task].

The core rule here is that we need some special term such as

        (ast_literal srcref type value)

where both type and value are strings. The type is determined
by which regexp was matched at which point in the grammar,
together with any post-processing done by the Scheme action code.
The value is some munged version of the original lexeme.
For example for integers, the underscores Felix allows can be removed.
We could also translate the allowed radix notations to decimal to
help simplify subsequent processing.

Output
=====

Sometimes, a literal will pass through the compilation process into
the back end C++ code generator. For this to work, there has to be
a way to supply a conversion function, which will take the literal
(or other) value, and convert it to a form useful in C++.

For example, if we allow an imaginary number literal:

        23.7i

we might output it as

        ::std::complex(0.0,23.7)

Obviously, we need a way to translate strings: we add quotes,
but we also have to escape things. And if we're going to C++ strings
rather than C's ntbs, we'd need 

        ::std::string("...\xFF...")

or something. Therefore, to implement user defined literals fully,
we ALSO need some way to translate value of the type of the literal.
The core translation has to be produced by the compiler, even if it
delegates work to C++ (as in the above examples). In other words,
it has to be done in Ocaml code. For example we could use
Scheme code passed to the OCS library.

Construction in Felix
================

Clearly, we have to create values in Felix too. At worst, this is easy enough:

        type(value)

will do it provided the appropriate function "type" is defined to accept a 
string.
Of course:

        int("123")

is a pretty bad way to handle integer literals :)

Folding
======

Constant folding is another issue. Freedom to write expressively
depends on constant folding, no one will write:

        "hello "
        "world"

if the implementation is at run time. Again, there's some difficulty
implementing this. Currently, special Ocaml code is used to fold
strings, integers, and provide short-cut logic (which is, in fact,
essential to the semantics).


Summary
=======

There's a lot more to do to get user defined literals
for user defined types. A bit easier to add new integers
or strings (since these are already supported by the compiler).
Eliminating that special support is the challenge.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to