This is a brief note in which I talk to myself and any lurkers out there about user defined literals.
Why? ==== Most languages are polluted by fixed grammars. In the history of programming languages, variables were introduced to allow for, well, variable values. And of course, we had user defined functions. Later, we got user defined types, which greatly improve the ability of a language to express things. In some languages, we got overloading, to allow a nice notation fixed in the language to be re-used for user defined types. A few languages also provide a lame way to add new operators. Some languages provided lame ways to add new statements, such as by using MACROs in C. Better macro technology had to wait in Scheme for very recent times. Felix is one of the first production languages to allow user defined grammar. This subsumes mere user defined operators with rigid precedences: within certain restrictions regarding not disturbing the base grammar, new constructions can be added relatively easily. In combination with additions to library code and the Dypgen GLR parsing capabilities, together with action codes written in Scheme, a wide class of extensions to the grammar can be made. Of course the semantics remain fixed by what the compiler can do. This dynamic grammar definition capability is so good, almost the whole of Felix is defined in user space. All we need is a hard coded bootstrap language in which to define the new grammar, and a couple of pre-defined terms. The pre-defined terms in Felix include the structure of identifiers and literals for numbers and strings. Now, if we were to desire a fully general parsing system, we need to lift these terms out of the hard coded grammar. At the same time, doing so has the advantage of allowing NEW literals and special terms to be added to the language. Thus, generalisation should allow separating the parsing technology from Felix, to create a new product, whilst at the same time making Felix itself more flexible. Regexp lexemes ============= Since Dypgen has user defined regexps, we can use this feature to lex literals and identifiers. [The last commit is well on the way to completing this task]. The core rule here is that we need some special term such as (ast_literal srcref type value) where both type and value are strings. The type is determined by which regexp was matched at which point in the grammar, together with any post-processing done by the Scheme action code. The value is some munged version of the original lexeme. For example for integers, the underscores Felix allows can be removed. We could also translate the allowed radix notations to decimal to help simplify subsequent processing. Output ===== Sometimes, a literal will pass through the compilation process into the back end C++ code generator. For this to work, there has to be a way to supply a conversion function, which will take the literal (or other) value, and convert it to a form useful in C++. For example, if we allow an imaginary number literal: 23.7i we might output it as ::std::complex(0.0,23.7) Obviously, we need a way to translate strings: we add quotes, but we also have to escape things. And if we're going to C++ strings rather than C's ntbs, we'd need ::std::string("...\xFF...") or something. Therefore, to implement user defined literals fully, we ALSO need some way to translate value of the type of the literal. The core translation has to be produced by the compiler, even if it delegates work to C++ (as in the above examples). In other words, it has to be done in Ocaml code. For example we could use Scheme code passed to the OCS library. Construction in Felix ================ Clearly, we have to create values in Felix too. At worst, this is easy enough: type(value) will do it provided the appropriate function "type" is defined to accept a string. Of course: int("123") is a pretty bad way to handle integer literals :) Folding ====== Constant folding is another issue. Freedom to write expressively depends on constant folding, no one will write: "hello " "world" if the implementation is at run time. Again, there's some difficulty implementing this. Currently, special Ocaml code is used to fold strings, integers, and provide short-cut logic (which is, in fact, essential to the semantics). Summary ======= There's a lot more to do to get user defined literals for user defined types. A bit easier to add new integers or strings (since these are already supported by the compiler). Eliminating that special support is the challenge. -- john skaller skal...@users.sourceforge.net http://felix-lang.org ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language