Re: [GSoC’11] Lexing and parsing

Robert Jacques Wed, 23 Mar 2011 07:51:24 -0700

On Wed, 23 Mar 2011 08:22:10 -0400, spir <denis.s...@gmail.com> wrote:

On 03/23/2011 05:39 AM, Robert Jacques wrote:
On Tue, 22 Mar 2011 18:27:51 -0400, Ilya Pupatenko<pupate...@gmail.com> wrote:
Hi,
First of all, I want to be polite so I have to introduce myself (youcan skipthis paragraph if you feel tired of newcomer-students’ posts). My nameisIlya, I’m a Master student of IT department of Novosibirsk StateUniversity(Novosibirsk, Russia). In Soviet period Novosibirsk became on of themost
important science center in the country and now there are very close
relations between University and Academy of Science. That’s why it’s
difficult and very interesting to study here. But I’m not planning tostudy
or work this summer, so I’ll be able to work (nearly) full time on GSoC
project. My primary specialization is seismic tomography inverseproblems,
but I’m also interested in programming language implementation and
compilation theory. I have good knowledge of C++ and C# languages and
“intermediate” knowledge of D language, knowledge of compilationtheory, someexperience in implementing lexers, parsers and translators, basicknowledgeof lex/yacc/antlr and some knowledge of Boost.Spirit library. I’m notanexpert in D now, but I willing to learn and to solve difficult tasks,that’s
why I decided to apply on the GSoC.
I’m still working on my proposal (on task “Lexing and Parsing”), but Iwant
to write some general ideas and ask some questions.
1. It is said that “it is possible to write a highly-integratedlexer/persergenerator in D without resorting to additional tools”. As Iunderstand, thelibrary should allow programmer to write grammar directly in D(ideally, thesyntax should be somehow similar to EBNF) and the resulting parserwill be
generated by D compiler while compiling the program. This method allows
integration of parsing in D code; it can make code simpler and evensometimes
more efficient.
There is a library for C++ (named Boost.Spirit) that follows the sameidea.It provide (probably not ideal but very nice) “EBNF-like” syntax towrite agrammar, it’s quite powerful, fast and flexible. There are three partsin
this library (actually there are 4 parts but we’re not interested in
Spirit.Classic now):
• Spirit.Qi (parser library that allows to build recursive descentparsers);
• Spirit.Karma (generator library);
• Spirit.Lex (library usable to create tokenizers).
The Spirit library uses “C++ template black magic” heavily (forexample, viaBoost.Fusion). But D has greater metaprogramming abilities, so it ispossible
to implement the same functionality in easier and “clean” way.
So, the question is: is it a good idea if at least parser library
architecture will be somewhat similar to Spirit one? Of course it isnot
about “blind” copying; but creating architecture for such a big system
completely from scratch is quite difficult indeed. If to be exact, Ilike anidea of parser attributes, I like the way semantic actions aredescribed, and
the “auto-rules” seems really useful.
I'm not qualified to speak on Spirits internal architecture; I've onlyused itonce for something very simple and ran into a one-liner bug whichremainsunfixed 7+ years later. But the basic API of Spirit would be wrong forD. “itis possible to write a highly-integrated lexer/perser generator in Dwithout
resorting to additional tools” does not mean "the library should allow
programmer to write grammar directly in D (ideally, the syntax should be
somehow similar to EBNF)" it means that the library should allow you towrite agrammar in EBNF and then through a combination of templates, stringmixins andcompile-time function evaluation generate the appropriate (hopefullyoptimal)parser. D's compile-time programming abilities are strong enough to dothe codegeneration job usually left to separate tools. Ultimately a user of thelibrary
should be able to declare a parser something like this:

// Declare a parser for Wikipedia's EBNF sample language
Parser!`
(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
'BEGIN' , white space ,
{ assignment , ";" , white space } ,
'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
` wikiLangParser;
How does one solve the issues of self/mutual/circular pattern recursionat compile-time?
Denis

How do you solve it at runtime? Then apply CTFE. Alternatively, how wouldyou solve it with a functional language, then apply templates. I think youmissed a point though; Parser generates a parser given a EBNF grammar. Andtherefore would internally behave like any other DSL -> code generationtool (except it would be a library).

P.S. self/mutual/circular pattern recursion occurs all the time intemplates and CTFE.

Re: [GSoC’11] Lexing and parsing

Reply via email to