Re: [GSoC’11] Lexing and parsing

Robert Jacques Tue, 22 Mar 2011 21:41:18 -0700

On Tue, 22 Mar 2011 18:27:51 -0400, Ilya Pupatenko <pupate...@gmail.com>wrote:

Hi,
First of all, I want to be polite so I have to introduce myself (you canskip this paragraph if you feel tired of newcomer-students’ posts). Myname is Ilya, I’m a Master student of IT department of Novosibirsk StateUniversity (Novosibirsk, Russia). In Soviet period Novosibirsk became onof the most important science center in the country and now there arevery close relations between University and Academy of Science. That’swhy it’s difficult and very interesting to study here. But I’m notplanning to study or work this summer, so I’ll be able to work (nearly)full time on GSoC project. My primary specialization is seismictomography inverse problems, but I’m also interested in programminglanguage implementation and compilation theory. I have good knowledge ofC++ and C# languages and “intermediate” knowledge of D language,knowledge of compilation theory, some experience in implementing lexers,parsers and translators, basic knowledge of lex/yacc/antlr and someknowledge of Boost.Spirit library. I’m not an expert in D now, but Iwilling to learn and to solve difficult tasks, that’s why I decided toapply on the GSoC.
I’m still working on my proposal (on task “Lexing and Parsing”), but Iwant to write some general ideas and ask some questions.
1. It is said that “it is possible to write a highly-integratedlexer/perser generator in D without resorting to additional tools”. As Iunderstand, the library should allow programmer to write grammardirectly in D (ideally, the syntax should be somehow similar to EBNF)and the resulting parser will be generated by D compiler while compilingthe program. This method allows integration of parsing in D code; it canmake code simpler and even sometimes more efficient.There is a library for C++ (named Boost.Spirit) that follows the sameidea. It provide (probably not ideal but very nice) “EBNF-like” syntaxto write a grammar, it’s quite powerful, fast and flexible. There arethree parts in this library (actually there are 4 parts but we’re notinterested in Spirit.Classic now):• Spirit.Qi (parser library that allows to build recursive descentparsers);
• Spirit.Karma (generator library);
• Spirit.Lex (library usable to create tokenizers).
The Spirit library uses “C++ template black magic” heavily (for example,via Boost.Fusion). But D has greater metaprogramming abilities, so it ispossible to implement the same functionality in easier and “clean” way.So, the question is: is it a good idea if at least parser libraryarchitecture will be somewhat similar to Spirit one? Of course it is notabout “blind” copying; but creating architecture for such a big systemcompletely from scratch is quite difficult indeed. If to be exact, Ilike an idea of parser attributes, I like the way semantic actions aredescribed, and the “auto-rules” seems really useful.

I'm not qualified to speak on Spirits internal architecture; I've onlyused it once for something very simple and ran into a one-liner bug whichremains unfixed 7+ years later. But the basic API of Spirit would be wrongfor D. “it is possible to write a highly-integrated lexer/perser generatorin D without resorting to additional tools” does not mean "the libraryshould allow programmer to write grammar directly in D (ideally, thesyntax should be somehow similar to EBNF)" it means that the libraryshould allow you to write a grammar in EBNF and then through a combinationof templates, string mixins and compile-time function evaluation generatethe appropriate (hopefully optimal) parser. D's compile-time programmingabilities are strong enough to do the code generation job usually left toseparate tools. Ultimately a user of the library should be able to declarea parser something like this:


// Declare a parser for Wikipedia's EBNF sample language
Parser!`
(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
           'BEGIN' , white space ,
           { assignment , ";" , white space } ,
           'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                     | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                     | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                     | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
` wikiLangParser;

Re: [GSoC’11] Lexing and parsing

Reply via email to