Hi, On 08.02.19 02:01, Peng Yu wrote:
> It seems to me that the parsing code could be made simpler by making > the parser reentrant. So there can be a parser parses anything not > heredoc and another parser just parse heredoc. And there should > different lexers for non-heredoc and heredoc. Is it so? The difficulty with lexers is that they keep their own buffer state, so switching between lexers mid-stream is non-trivial. Normally, you'd use lexer states to activate/deactivate rules. The primitive approach would be %x INITIAL HEREDOC and then prefixing all matches with <INITIAL> or <HEREDOC>. The main problem there is that state changes need to be driven by the lexer code, as the BEGIN macro is only available there, so a change from the parser would have to be communicated through yyextra, and applied in the lexer code before matching a token (so YY_USER_ACTION is too late). The other thing is that parsing heredocs with the lexer is rather pointless, as the only thing we are interested in is dynamic anyway, so grabbing the data out of the lexer stream with a custom function is probably the better approach. Some people use tar files as heredocs, so a "[a-zA-Z]*" rule can match really long strings there, which the lexer would have to extend its buffer for in order to provide yytext/yyleng. We can't limit the match length either because then we'd have to jump through a lot of hoops to match the end tag if it is straddled across two matches If the lexer can identify heredocs reliably, then it's probably best to let it provide a token HEREDOC to the parser after setting up the state for heredoc parsing (which may live in yyextra to make it reentrant, but that's orthogonal), and the parser then calls a special function to retrieve the heredoc from the lexer's stream. That function would live in the lexer source file so it can request more characters from the stream. Another option I could see would be to have the lexer return fragments of the heredoc, and just repeat the token as long as there is data — this would also avoid having to read the entire stream into memory, and keep the interface between lexer and parser down to yylex(). Simon
signature.asc
Description: OpenPGP digital signature
_______________________________________________ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison