GSoC: Parse Makefile.am using abstract syntax tree Vishal Gupta Automake Branch: experimental/gsoc/ast
1) Introduction:- The goal of this project is to parse Makefile.am using separate lexical and parsing phase and generate an Abstract Syntax Tree as output. The current implementation of parser combines both the phases into a single unit. Extending the grammar become difficult and testing the module become tedious as pointing the source of error requires many test cases. This project will separate the lexical, parsing and AST generation phases, so that different phases can be unit tested and after combining integration testing can be done. 2) Period of work:- The work had been started from 16th May, 2018 and will be completed by 6th August, 2018. 3) Planning of work:- Parsing Makefile will be done in two stages:- Lexer:- Input file will be converted line by line into tokens and passed to the parser. Regular expression is used to identify lexemes and convert them to tokens. An array of tokens is returned by the lexer. Parser:- It parses the Makefile.am file according to the grammar and builds the AST. The grammar of Automake is implemented in bison and it is converted into the parsing table. Parser pushes tokens to the stack or reduce according to the grammar. During reduction, Tree node is created which is internally a hash consisting of node information and its child references. Sample Parsing Table :- @table = ({ num_k => 1 , expr => 3 , input => 2 } , { reduce => [ 1 , \&num ] } , { end => 4 } , { '-' => 6 , '/' => 8 , '+' => 5 , '*' => 7 , '\n' => 9 } , { }) ; Every index in table represents particular parsing state. The state consist of hash of token and next state number. Reduce key represent reduction of tokens and creation of node in AST. It consist of an array consisting of number of tokens to reduce and function to call for creating the node. An empty hash represent acceptance state. Parser :- while( @stack ) { if( $stack[ -1 ] == $acceptState ) print "Complete"; my @curr_token = @{ $tokens[ 0 ] } ; if( $val = $arr[ $stack[ -1 ] ] { $curr_token[ 0 ] } ) { push @stack, \@curr_token, $val; shift @tokens; } elsif( $val = $arr[ $stack[ -1 ] ] { reduce } ) { my @val1 = @$val; my @param; for( $i = 1 ; $i <= 2 * $val1[ 0 ] ; $i++) if( $i % 2 == 0) { $val = pop @stack; push @param , $val; } else pop @stack; push @stack , $val1[ 1 ] - > ( @param ); push @stack , $arr[ $stack [ -2 ] ]{ $stack[ -1 ] -> { name } }; } else die "Unexpected Token ". $curr_token."\n"; } The above code parses the input and generates the AST. It find the next state to jump according to the current token and current state number. If shifting on stack is not possible, it will check for reduce entry and if found, will call the node function with required parameter. If both shift and reduce is not possible, then it represents an error. Automake Grammar:- The grammar of Automake is written in GNU Bison. Other Perl modules like Marpa::R2 or Parse::RecDescent could be used for specifying the grammar and for parsing but they would become a dependency for Automake. Using bison does not have this disadvantage as it will only be used by the developers to extend the grammar and develop the parsing table. Once the parsing table is developed, bison is not required. Another advantage is that, extending the grammar only require changes in bison file, lexer file if new tokens are added and AST file for corresponding nodes. The parser would not have to be updated. Methods for converting bison grammar to Parsing table:- Manual :- Bison grammar[1] is graph is made using bison --graph option. The resultant file is converted into image using dot utility. The image[2] represents the state transition diagram which can be converted to the parsing table. Each edge from one node to another represent a transition on finding a particular token. This can be converted manually into parsing table. Automatic :- As the grammar grows, the number of states increases and chances of manual conversion being correct can decrease. Automatic conversion can convert the graph to Perl parsing table in the above specified format. (Converter.pl [3]) Mixed:- As the automatic conversion script is not bug free. First automatic conversion and then manual checking can be applied, to reduce the time and easily identify errors. 4) Progress:- Implementation of the project is iterative. Basic lexer, parser and AST module are implemented. New grammar features are added and the modules are updated accordingly. Currently the parser identifies different type of primaries, distinction between Automake variable definition and Make rule. A program for converting the bison grammar to Perl parsing table is also implemented. 5) Next Step:- Extending the parser to support different type of variables identified by Automake and creating unit test for these modules. Footnotes:- [1] Bison grammar <http://git.savannah.gnu.org/cgit/automake.git/tree/lib/Automake/Parser/automake.y?h=experimental/gsoc/ast&id=6115c12f37db3e7a1746dbc21fc3949e31efce7f> [2] Automake graph <https://imgur.com/a/H869UCG> [3] Converter.pl <http://git.savannah.gnu.org/cgit/automake.git/tree/lib/Automake/Parser/Converter.pl?h=experimental/gsoc/ast&id=6115c12f37db3e7a1746dbc21fc3949e31efce7f>